johnramey

Don't think. Compute.

Archive for the ‘baseball’ tag

Getting Started with Some Baseball Data

without comments

With all of the discussions (hype?) regarding applied statistics, machine learning, and data science, I have been looking for a go-to source of work-unrelated data. I loved baseball as a kid. I love baseball now. I love baseball stats. Why not do a grown-up version of what I used to do when I spent hours staring at and memorizing baseball stats on the back of a few pieces of cardboard on which I spent my allowance?

To get started, I purchased a copy of Baseball Hacks. The author suggests the usage of MySQL, so I will oblige. First, I downloaded some baseball data in MySQL format on my web server (Ubuntu 10.04) and decompressed it; when I downloaded the data, it was timestamped as 28 March 2011, so double-check if there is an updated version.

mkdir baseball
cd baseball
wget http://www.baseball-databank.org/files/BDB-sql-2011-03-28.sql.zip
unzip BDB-sql-2011-03-28.sql.zip

Next, in MySQL I created a user named “baseball”, a database entitled “bbdatabank” and granted all privileges on this database to the user “baseball.” To do this, first open MySQL as root (mysql -u root -p)

CREATE USER 'baseball'@'localhost' IDENTIFIED BY 'YourPassword';
CREATE databas bbdatabank;
GRANT ALL PRIVILEGES ON `bbdatabank`.* TO 'baseball'@'localhost';
FLUSH PRIVILEGES;
quit

Note the tick marks (`) around bbdatabank when privileges are granted. Also, notice the deliberate misspelling when I constructed the db. WordPress freaks out on me because mod_security steps in and says, “Umm, no.” For more info about this, go here and here (see the comments as well).

Finally, we read the data into the database we just created by:

mysql -u baseball -p -s bbdatabank < BDB-sql-2011-03-28.sql

That’s it! Most of this code has been adapted from the Baseball Hacks book, although I’ve tweaked a couple of things. As I progress through the book, I will continue to add interesting finds and code as posts. Eventually, I will move away from the book’s code as it focuses too much on the “Intro to Data Exploration” reader with constant mentions of MS Access/Excel. The author means well though as he urges the reader to use *nix/Mac OS X.

Written by ramhiser

May 24th, 2011 at 9:34 pm

Posted in statistics

Tagged with , ,