As originally published on Baseball Prospectus here.
(Photo: Dominic Cotroneo/Cronkite Sports)
The following is excerpted from The Evolution of Data and Statistics in Baseball, a thesis paper written by Jake Garcia for the Walter Cronkite School of Journalism and Mass Communication, at Arizona State University.
Doing introductory research on a niche topic like the interplay between baseball, stats and data, I initially pegged baseball as experiencing something similar to a Schumpeterian moment—an economic theory derived from the work of Karl Marx that describes a process of destroying an old structure and creating a new one. But ever since its inception, baseball and numbers have been heavily intertwined, and the Schumpeterian moment has been fixed in a never-ending cycle. The destruction of old stats and analytics and the creation of new ones has been incessant and perpetual. It did not stop when the concept of the run batted in was introduced in 1879; it definitely did not stop when computers came along in the 1960s to encourage in-depth statistical analysis; and it will not stop any time in the near future with technological advances like Statcast continuing to mold the way people think about baseball. How does the change continue and how will it manifest itself?
Spotlighting college baseball was presumably a good starting point. Since the NCAA is chock-full of players striving to embark on their professional careers in the near future, you would expect it to also be the perfect grounds for experimentation with data, stats and technology, much like the Arizona Fall League, right? Well, not exactly.
Based on conversations with Michael Baumann (a writer at D1Baseball.com), Aaron Fitt (also a writer atD1Baseball.com), Thomas Lenneburg (the sports information director at Arizona State University) and Jeff Sackmann (the founder of collegesplits.com), it is clear college baseball lags behind pro ball in many important areas. The analytical age that has hit MLB coverage is entirely absent from college baseball. Even at a basic stat-keeping and stat-disseminating level, nagging issues are evident.
“The most obvious thing for me is it doesn’t track plate appearances,” Baumann said. “I was covering a Louisville-Florida State series last year, and I wanted to find out the walk rate for Florida State’s team because they’re incredibly patient—they usually lead the nation in walks. Not only did I have to do the division myself, I had to add up all the hits, errors and hit by pitches, all that stuff… Maybe there is just not enough of a demand, but that’s just the simplest, easiest thing.”
Whereas Baumann’s issue was negotiable (given he was eventually able to calculate the stat by hand), Fitt’s biggest complaint is insurmountable. It concerns a particular hitter’s success against left-handed and right-handed pitchers and the impossibility of calculating it in a timely manner.
“It’s such a pain in the butt to get to,” Fitt said. “It’s so bizarre, because that information is so commonly available in the big leagues and also the minor leagues… But the college coaches do guard that stuff like the nuclear code. It’s insane.”
Lenneburg, who is on the other side of this issue given his position as a sports information director, said college coaches and athletic departments safeguard splits because they perceive a competitive disadvantage if they were to make it publicly accessible. In an effort to provide a glimpse into how Arizona State (and likely every Pac-12 school) records and shares data regarding the baseball team, he detailed the basic protocol.
“It’s just like scoring on a piece of paper,” Lenneburg said. “It’s just inputting it, and at the end of the game you’ll input winning pitcher (and) attendance… From there, you just create a pack file, and then StatCrew (stat-keeping software Arizona State and other universities use) handles turning the information into the situational stats or the by-player (stats). Really, all you’re doing is inputting what happens, and the computer is creating the rest of the stats from that, whether it’s as simple as batting average or as complex as WHIP.”
Lenneburg’s closing sentence underscores another issue that college baseball faces: WHIP is considered complex. When the formula (walks plus hits, divided by innings pitched) and basic principle (how many baserunners a pitcher allows per inning) are combined into the very acronym of the stat, and that is considered complex, there’s an issue—at least, for people trying to use data to tell stories like Baumann and occasionally Fitt.
But what is the reason for this? Is the answer simply that college baseball coaches, players and fans do not care enough for meaningful changes to be made? Or are there external forces—such as a lack of resources—that prevent change from taking place? It appears to be both.
“I don’t think the average college baseball fan is as much of a stats-oriented reader as you might find in pro ball,” Fitt said. “The advanced stats revolution, I feel like it’s put on by the fact that there’s a lot more technology available. Not only Major League Baseball but also minor-league baseball, with [MLB Advanced Media], with all the stuff that you can now find through minorleaguebaseball.com. There’s just a lot more resources there, there’s a lot more data.”
Lenneburg also acknowledged the lack of funding being a hurdle, but then alluded to a growing demand for more data in college baseball eventually influencing change.
“I mean, obviously part of our issue is the lack of resources,” Lenneburg said. “(But) I think beyond even for their own good, for the fans it’s just sort of what people want now. It’s why they do it on TV. It’s just another added component of the game where you can quantify how somebody did… It’s a combination of wanting the data to win and then wanting it to build fan affinity and get people in the door and get them excited.”
Make no mistake about it: There are many brilliant coaching minds in college baseball. According to Baumann, coaches may be making decisions that have a strong backbone of in-depth statistical analysis, even if they don’t know they are doing so—a glowing indication that sabermetric thinking and traditional, gut-based thinking are not all too far separated.
“A lot of them are ways that we think of sabermetrically sound,” Baumann said. “The thing with Cal State Fullerton is their pitchers never ever walk anybody… (Vanderbilt) strikes a lot of guys out; Florida State draws a ton of walks and they run incredibly deep counts. Vandy plays a lot of four- or five-hour games because their kids are seeing five or six pitches an at-bat. I think a lot of that is just intuition. We know it has mathematical basis, but I don’t know if any of the coaches are doing it for that reason. I think it’s just something that they’ve sort of teased out by their own experience.”
Another influential player in this is Jeff Sackmann—someone who saw a gap in the market of college baseball stats and founded a business as a result. Sackmann and his partner, Kent Bonham, had the idea of gathering play-by-play data for college baseball, and collegesplits.com was born. From his vantage point (one that’s markedly different from those of Baumann and Fitt), the outlook of college baseball stat-keeping and data usage is not as bleak as it may seem.
“Many college coaches do rely on us for advanced scouting reports, split stats and spray charts on their opponents,” Sackmann said. “Not many coaches care much about advanced stats, but that’s changing, and even those coaches who don’t care at all still want to see spray charts and basic splits.
“Coaches are already using more and more data when it comes to working with their own players—radar guns keep getting cheaper, and sports science is steadily changing how players can train to maximize achievement. Analysts who work with performance data—that is, stats from the games themselves—tend to ignore that side of things, but it’s huge,” he said.
Along with more college programs seeking out advanced stat services like Sackmann’s, college baseball is also benefitting, slowly but surely, from expanding technologies in accruing the data (which Sackmann aggregates) in the first place. It may not be as grandiose as MLB’s Statcast, but services such as Trackman and PITCHf/x are starting to become far more common for NCAA data tracking.
In fact, Fitt was a key middle man in the installation process, as he connected those at Trackman with those who would ultimately house the technology in their stadiums.
“A couple of years ago, I helped (Trackman) get in touch with a lot of college coaches, and I think they started with maybe eight or 10 or 12 programs,” Fitt said. “It’s still probably only the big-resource programs that can afford to do it… But their data is so cool.”
When considering all these dynamic forces shaping the college baseball data landscape, it is easy to come away with a blurred picture. On the one hand, basic stat-keeping continues to be a source of frustration for writers like Baumann and Fitt. If a stat as simple as plate appearances is not recorded, how will more in-depth stats and data ever surface?
Sackmann acknowledged a degree of limitation with the evolution of college baseball stats, but also sensed adaptation and open mindedness starting to infiltrate programs.
“Because players are generally recruited when they’re 18 or 19, and there’s no trading or free-agent signing like in the pros, college baseball will never be like contemporary MLB, where every team has a staff of (data) analysts valuing players,” Sackmann said. “But in the areas that data can help—training, advanced scouting, etc.—plenty of coaches are doing a lot to take advantage of what’s out there.”