These vignettes serve a dual purpose:
to introduce users of the Lahman
package to the breadth and depth of the data, and a range of analysis and statistical methods that can be undertaken using the data in the package,
to introduce users to the statistical software R, but particularly to the modern use in statistics and data science encapsulated in the tidyverse
of R packages designed to facilitate data input and manipulation and graphics.
Vignettes completed to-date:
Relationship Between Strikeouts and Home Runs – This vignette looks at the relationship between rate of strikeouts and home runs from the year 1950+. This question was inspired by Marchi and Albert (2014), Analyzing Baseball Data in R.
car
(Companion to Applied Regression)Run Scoring Trends – Major League Baseball average per-game run scoring for each season since 1901.
Team Payroll and the World Series – This vignette examines whether there is a relationship between total team salaries (payroll) and World Series success.
A number of books and on-line resources use the Lahman
package as material for the examples. These include:
Michael Friendly and David Meyer (2016) Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data (CRC Press). DDAR Web Site
Max Marchi and Jim Albert (2014) Analyzing Baseball Data with R (CRC Press)
Lahman
package was relatively recent when the book was published, and authors make a brief mention of the package.David Robinson (2017) Introduction to Empirical Bayes (published at [gumroad.com])
the book makes extensive use of the package to explain “the empirical Bayesian approach to estimation, credible intervals, A/B testing, mixture models, and other methods, all through the example of baseball batting averages.”
Hadley Wickham and Garrett Grolemund (2017) R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (O’Reilly)
Steven Buechler (2014-2015) Analysis of career performance in top home run hitters
Kris Eberwein (2015-09-30) “Hacking The New Lahman Package 4.0-1 with R-Studio” (via [r-bloggers.com])
Michael Lopez (2016) Lab materials for Skidmore College MA 276, “Sports and Statistics”
Bill Petti (2015-09-21) A Short(-ish) Introduction to Using R Packages for Baseball Research
Exploring Baseball Data with R blog
Jim Albert (2018-12-24) The Vanishing 300 Batting Average
Jim Albert (2015-01-05) A Graph of a Batting Average
Brian Mills (2014-09-30) Using ggmap and Lahman to Find the Hometown College Rosters
-30-