“The” search feature
Posted by Andy on July 24, 2009
Several weeks back, Sean announced the addition of a new search feature where using "the" in the search box returns the page meeting the search criteria with the highest number of page views.
I absolutely love this new feature and it has saved me a lot of time. In case you don't quite get how it works, let me explain:
Let's say you want to see Willie Mays' page, so you enter "Willie Mays" into the search box.
Click on that link and you'll see the result: Willie Mays, as well as Willie Mays Aikens.
Now trying putting "the Willie Mays" into to search box. It takes you direct to Willie Mays' page because he has more page views than Aikens.
Big deal? Well let's say you search for just "Mays".
That gives you 11 major-leaguers and 36 minor-leaguers. But "the Mays" takes you right to Willie's page again. So does "the Willie" and even "the Willy".
Now, here's some fun.
Can you guess who comes up when you put in "the Jim"?
It's not James Cool Papa Bell, Jim Bottomley, Jim Bouton, Jim Bunning, Jimmie Fox, Jim Thome, or any other All-Star or Hall of Famer. Nope, it's this guy, who has the most hits of any Jim.
I was recently wondering who'd come up the most for "the Charlie". I thought maybe Pete Rose, due to his nickname. But anybody with the first or middle name of Charles or Charlie is fair game, meaning even a guy like Whitey (Edward Charles) Ford is eligible. Turns out that the Charlie with the most page views is, ahem, Mickey Mantle.
This new search feature isn't perfect, but it makes finding many guys a lot quicker.
Which "the" searches can you find that turn back surprising results?
July 24th, 2009 at 1:52 pm
"The Bill" turns up William Roger Clemens.
Which is great fun and all, though I kind of wish it was a little more rigid. If I type "The Bill," odds are I want to find somebody who was actually popularly known by that name (not that I can think of who I'd be looking for in this case...Buckner maybe? Just entered my own first name).
That said though, this is fantastic. My computer decides to load some webpages very, very slowly, so it could actually be a considerable nuisance to have no way of getting directly to, say, Griffey (Jr.) or The Frank Thomas. Or even Kent Hrbek if I were to forget and use his last name only, since now that minor league records go back so far there's actually another "Hrbek" who played for three minor league teams in 1952. 🙂
Great work.
July 24th, 2009 at 2:11 pm
Also "The Lou" turns up Henry LOUis Aaron, instead of the expected Lou Gehrig. Forgot his middle name was Louis.
July 24th, 2009 at 3:22 pm
I am just excited that "The Don" turns up Donnie Baseball
July 24th, 2009 at 3:32 pm
I was thinking Little would return John McGraw, but was pleasantly surprised to see the Little General, Johnny Bench.
And the dragon has only been used by Oscar Dragon, a minor leaguer in 1946
At least Gehrig comes up with "the biscuit"!
But, it is a cryin' shame that Dan Cooley Uggla comes up instead of Cool Papa Bell when searching "the cool"
July 24th, 2009 at 3:58 pm
The Ball = Ted Williams (Teddy Ballgame)
The Cap = Frank Howard (Capital Punishment, which puts him ahead of Hall of Famers Anson, Sparky Anderson, and Fred Clarke along with several All-Stars)
The Ed = Peter Edward Rose, Sr.
The Will = William Roger Clemens (never knew Roger wasn't his first name)
The Mike = Mickey Mantle
The A = Alex Rodriguez
The E = Alexander Emmanuel Rodriguez
The I = Sammy Sosa (Say It Ain't Sosa? Is that what triggered this?)
The S = Babe Ruth (The Sultan of Swat)
The X = Jimmie Foxx (Double X)
The Mr = Derek Jeter (Mr. November)
The Man = Mickey Mantle (sorry, Stan)
The Harry = Hank Aaron (I don't know; is Harry sometimes a nickname for Henry?)
The Black = Don Sutton (Black & Decker)
The Red = Rick Sutcliffe (The Red Rooster)
The Sox = Carl Husta (Sox)
The Giant = Jack Pfiester (Jack the Giant Killer)
The Met = Douglas Metunwa Glanville
The Brave = Bravenor (a minor leaguer about whom most details, including first - or maybe last - name, are unknown)
The Twin = George Selkirk (Twinkletoes)
The Royal = Kenneth Royal Williams (this is the recent one, not the guy from the 1920s)
July 24th, 2009 at 7:28 pm
Keep in mind--this search feature was a brainstorm by Sean that was quick for him to add in. It wasn't designed to be anything "smart"--just a way to reduce the number of clicks needed in many cases. He witnessed a good example of where this would have been useful--Carlos Lee. Putting "Carlos Lee" into the search box brings up the major leaguer and 2 minor leaugers, but of course "The carlos lee" brings up the current slugger. "The carlos" brings up Carlos Delgado (I was right, Sean!)
July 24th, 2009 at 7:31 pm
Oh and regarding "Harry" vs "Henry", Sean has (for a very long time) included commonly mistaken first names on all the pages. For example, I think every player named Mark also has "Mike" at the bottom of his page. To real baseball fans, this seems insane, but in 1998 I remember hearing plenty of fringe fans referring to "Mike McGwire" and all his home runs (goodness knows how they thought "McGwire" was spelled.) If you look near the bottom of any player page, you'll see common names and misspellings.
July 24th, 2009 at 8:56 pm
I can't speak for anyone else, but I wasn't being critical; I was just listing stuff that either amused or surprised me.
July 25th, 2009 at 12:26 am
the worst = Dana Worster
the best = Karl Best
the ass = Daniel Peter Graves (Baby-faced Assassin)
the only = Edward Sylvester Nolan
the the = George Herman Ruth (The Bambino and The Sultan Of Swat)
the lady = Lady Baldwin
the base = Donald Arthur Mattingly (Donnie Baseball)
the dad = Cecil Grant Fielder (Big Daddy)
the mom = Danilo Mompres
the bar = Barry Lamar Bonds
the rich = Dick Allen
the poor = Thomas Iverson Poorman
the tv = Timothy N Tveit
the winner = Frederick C Winner
the jerk = John Rocker
OK, I made that up. It actually goes to George Howard Northrop (Jerky).
July 25th, 2009 at 5:59 pm
"The dave" gives us Mark McGwire (due to his middle name)
"The donald" gives us Don Mattingly (not Trump)
"The weiner" gives us Lefty Weinert (ya all remember him, don't ya?)
I'm not gonna tell you what I typed in to get a minor leaguer named Casey Cuntz
July 26th, 2009 at 2:26 pm
Yes, Harry is a nickname for Henry, as is the case with the younger son of Prince Charles and Princess Diana. It's also a nickname for Harold, as was the case with Harold Kalas, a one-time broadcaster with the Phillies, and a guy named Rasmussen who pitched in the 1970s who was first known as Harry but then later changed his name to Erik.
The Phil - Philip Henry Niekro
The Nat - Nothing came up, not even Nate Colbert or any other player named Nate
The Colbert - Nothing came up, not even Nate Colbert or Colbert Richard Hamels (whose middle name is given on some sites as Michael)
The Cole - Nothing came up, not even Cole Hamels or Vince Coleman or any of the other players named Cole or Coleman
The Ham - John David Milner (The Hammer) - Not Henry Louis Aaron (Hammer, Hammerin' Hank or Bad Henry)?
The Bird - Mark Steven Fidrych (The Bird)
The Marlon - Marlon Anderson came up this time, but Marlon Byrd came up when I first tried this a few days ago!
The Card - Theodore John Cardasis, a minor leaguer - Over Jose Cardenal, Leo Cardenas, and Don Cardwell?
The Brown - Jerry Browne, Bobby Thomson (Robert Brown Thompson), and minor leaguer Theodore Browning
The Black - Ewell Blackwell
The White - Whitey Ford, Marty Marion (Martin Whiteford Marion), and minor leaguer Theodore Whiteman - Are only minor leaguers named Theodore showing up?
The Green - Nothing came up, not even Hank Greenberg or any of the number of players with the last name of Green or Greene
The Blue - Nothing came up, not even Vida
The Grey - Tristram E Speaker (The Grey Eagle and Spoke)
The Red - Rusty Greer (The Red Baron), Red Lucas, Doug Rader (The Red Rooster), Lucky Wright (William The Red) - But not Red Schoendienst!
The Orange - Nothing came up. Wasn't expecting much except maybe Rusty Staub (Le Grand Orange). And yes, that nickname does appear on his player page.
July 27th, 2009 at 9:41 am
The Aaron = Vance Aaron Law
The Moose = Dick Radatz
The Dog = Greg "Mad Dog" Maddux
Hank Aaron and Mike Mussina have to have more page views than Law and Radatz, right? Don't mean to criticize--it's interesting, regardless, just kind of surprising...
July 27th, 2009 at 10:16 am
DoubleDiamond, are you sure you were using the feature properly? I think the key is that the word "the" needs to be in lower case, otherwise it thinks that it is part of the name. I got the following results for the ones you mentioned that nothing came up:
the Nat = Gary Nathaniel Matthews Sr. (Sarge)
the Colbert = Colbert Richard "Cole" Hamels
the Cole = Michael Cole Mussina (Moose)
the Green = Henry "Hank" Benjamin Greenberg (Hammerin' Hank)
the Blue = Vida Rochelle Blue Jr.
the Orange = Daniel Joseph "Rusty" Staub (Le Grand Orange)
Also, I got the following results on others you were surprised on:
the Ham = Henry Louis "Hank" Aaron (Hammer, Hammerin' Hank or Bad Henry)
the Card = Martin Kevin Cordova -- you got me on why...
the Brown = [James] Kevin Brown
the Black = Donald Howard Sutton (Black & Decker)
the White = Edward Charles "Whitey" Ford (The Chairman of the Board and Slick)
the Red = Richard Lee Sutcliffe (Red Baron) -- still, no Red Schoendienst
One disappointment for me was that "the Tom" came up with Tom Glavine instead of Tom Terrific Seaver (I guess Glavine's recent achievement of 300 wins spiked his page lookups.)
Also, “the Epp” came up with Eppa Rixey (Jephtha). I wonder what the story is on his nickname?
“the Great” comes up with John Walter “Duster” Mails (Walter and The Great) -- not even an All-Star, let alone a Hall of Famer.
Mandamin: Same comment as DoubleDiamond; "the" needs to be lower case. "the Aaron" comes up with Hank, "the Moose" comes up with Mussina.
July 27th, 2009 at 5:03 pm
there should be no reason for the to be only lower case, I'll fix that.
July 27th, 2009 at 6:02 pm
Sean: I tested the previously mentioned cases with caps and lc - there is a difference (e.g. "The Aaron" does indeed find Vance Aaron Law, but "the Aaron" finds Hank.)
July 27th, 2009 at 6:49 pm
Good job by all finding interesting cases as well as chasing down the full performance of the search code. Sean relayed to me that this brainstorm was a quick addition he was able to make, and now I think it'll be even better.
July 27th, 2009 at 10:12 pm
Duster Mails was a very interesting character. "Great" was the nickname he gave himself - no lack of self-confidence there. And he was great - for a very brief while. If he was never an All-Star, that's at least in part because there was no All-Star game when he was playing. Heck, Ty Cobb was never an All-Star.
As for Rixey, I thought Jephtha was his given name (it's Biblical), Eppa, his nickname. Eppa sounds like a shortened version of Jephtha, or maybe as close to Jephtha as he could come to saying Jephtha when he was three years old. Sometimes these chilhood pronunciations live on as nicknames.
July 28th, 2009 at 12:32 pm
I've fixed the uppercase lowercase issue so "THE aaron" now returns Hank. I also entered a couple hundred new misspellings and shortcuts, and applied the misspellings to the minor leagues as well, so things should be a bit better on the search side now. I have a hard time believing it, but "Alez Rodriguez", "Alex Rodgriguez" and a couple others have been entered more than 600 times into the search box, and there are about 5 spellings of Mark Teixeira used an equal amount.
July 28th, 2009 at 8:31 pm
Using an upper case T at the beginning of "the" - That must explain why I got all of those Theodores!
July 30th, 2009 at 6:38 am
We love you Sean.
By the way, is Ichiro you favorite player? When you type in "the Sean" you get Ichiro!
July 30th, 2009 at 5:13 pm
According to The Baseball Biography Project at SABR’s Website, Eppa was Rixey’s given first name (actually, he seems to be Eppa Rixey II, as his son and grandson are named Eppa Rixey III and IV.) The nickname Jephtha (or Jeptha) was apparently hung on him by sportswriter William Phelan, and he (Rixey) tolerated it, but was none too pleased with it.
July 31st, 2009 at 8:10 pm
In case anyone was curious, Jim Morris was the player that the movie "The Rookie" was based off of.
August 9th, 2009 at 6:11 am
The Good = Dwight Gooden
The Bad = (Bad) Henry Aaron
The Ugly = Johnny (Ugly) Dickshot
August 9th, 2009 at 6:13 am
The Rookie = {} (empty set). I guess there has never been a player with a name or nickname like Rookie?