---
title: "Vignette Introduction"
author: "Martin Monkman"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Vignette Introduction}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

## Introduction


These vignettes serve a dual purpose:

* to introduce users of the `Lahman` package to the breadth and depth of the data, and a range of analysis and statistical methods that can be undertaken using the data in the package, 

* to introduce users to the statistical software **R**, but particularly to the modern use
in statistics and data science encapsulated in the `tidyverse` of R packages designed to
facilitate data input and manipulation and graphics.


## Contents

Vignettes completed to-date:


1. [Relationship Between Strikeouts and Home Runs](strikeoutsandhr.Rmd) -- This vignette looks at the relationship between rate of strikeouts and home runs from the year 1950+. This question was inspired by Marchi and Albert (2014), _Analyzing Baseball Data in R_. 

    + R packages demonstrated: [`car` (Companion to Applied Regression)](https://CRAN.R-project.org/package=car)


2. [Run Scoring Trends](run-scoring-trends.Rmd) -- Major League Baseball average per-game run scoring for each season since 1901.


3. [Team Payroll and the World Series](payroll.Rmd) -- This vignette examines whether there is a relationship between total team salaries (payroll) and World Series success.






***

## Further reading

A number of books and on-line resources use the `Lahman` package as material for the examples. These include:

### Books

Michael Friendly and David Meyer (2016) _Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data_ (CRC Press). [DDAR Web Site](http://ddar.datavis.ca/)


Max Marchi and Jim Albert (2014) _Analyzing Baseball Data with R_ (CRC Press)

* the book makes frequent reference to the raw Lahman database files in CSV form; the `Lahman` package was relatively recent when the book was published, and authors make a brief mention of the package.
There is a [Baseball with R](https://baseballwithr.wordpress.com/) blog containing extensions
and other analyses.

David Robinson (2017) [_Introduction to Empirical Bayes_](https://drob.gumroad.com/l/empirical-bayes) (published at [gumroad.com])

* the book makes extensive use of the package to explain "the empirical Bayesian approach to estimation, credible intervals, A/B testing, mixture models, and other methods, all through the example of baseball batting averages." 

* the [blog introduction to the book](http://varianceexplained.org/r/empirical-bayes-book/)


Hadley Wickham and Garrett Grolemund (2017) [_R for Data Science: Import, Tidy, Transform, Visualize, and Model Data_](https://r4ds.had.co.nz/) (O'Reilly)

* [Section 5.6.3, "Counts"](https://r4ds.had.co.nz/transform.html#counts)


### Articles, blog entries, and course materials

Steven Buechler (2014-2015) [Analysis of career performance in top home run hitters](https://www3.nd.edu/~steve/computing_with_data/16_Baseball_example/baseball_example.html)

* This is lecture 16 from [Computing with Data Seminar](https://www3.nd.edu/~steve/computing_with_data/)


Kris Eberwein (2015-09-30) ["Hacking The New Lahman Package 4.0-1 with R-Studio"](https://www.r-bloggers.com/2015/09/hacking-the-new-lahman-package-4-0-1-with-r-studio/) (via [r-bloggers.com])


Michael Lopez (2016) [Lab materials for Skidmore College MA 276, “Sports and Statistics”](https://statsbylopez.com/276labs/)

* [course reference page](https://statsbylopez.com/stats-sports-class/) 


Bill Petti (2015-09-21) [A Short(-ish) Introduction to Using R Packages for Baseball Research](https://tht.fangraphs.com/a-short-ish-introduction-to-using-r-for-baseball-research/)



**_Exploring Baseball Data with R_ blog**

Jim Albert (2018-12-24) [The Vanishing 300 Batting Average](https://baseballwithr.wordpress.com/2018/12/24/the-vanishing-300-batting-average/)

Jim Albert (2015-01-05) [A Graph of a Batting Average](https://baseballwithr.wordpress.com/2015/01/05/a-graph-of-a-batting-average/) 

Brian Mills (2014-09-30) [Using ggmap and Lahman to Find the Hometown College Rosters](https://baseballwithr.wordpress.com/2014/09/30/using-ggmap-and-lahman-to-find-the-hometown-college-rosters/)


-30-