Must-Have R Packages for Social Scientists

After recently having to think critically about the value of various R packages for social science research, I realized that others might find value in a post on “must-have” R packages for social scientists. After the immensely popular post on this topic for Python packages a follow-up seemed appropraite. If you conduct social science research but are desperately clinging onto your SAS, SPSS or Matlab licenses; waiting for someone to convince you of R’s value, please allow me to be the first to try.

R is a functional programming language that allows for seamless data exploration, manipulation, analysis and visualization. The community using and supporting the language has exploded over the last several years, which has lead to the development of several immensely useful packages, many of which have direct application in the social sciences. Below are the R packages I use on a weekly/daily/monthly basis (in no particular order) and highly recommend to any R users; new or old.

  1. Zelig

    Put simply, Zelig is a one-stop statistical shop for nearly all regression model specifications. Using a uniform syntax across model types, and several extremely useful plotting functions, the package’s autor Gary King (Political Science and Statistics at Harvard University) calls Zelig “everyone’s statistical software,” which is a very accurate description. if there is one R package that every social scientist should have it is Zelig!

    Download Zelig

  2. ggplot2

    One of the advantages of R as a functional language is it contains a set of convenient base functions for plotting data. While useful when exploring a dataset, they are–for lack of a better word–ugly, and this is where ggplot2 comes in. Using the Grammar of Graphics manifesto as a guide, creator Hadley Wickham designed ggplot2 to “take the good parts of base and lattice graphics and none of the bad parts,” and he succeed. This is the premier R package for conveying your analysis visually.

    Download ggplot2

  3. Statnet/igraph

    I have combined the two competing network analysis packages in R into a single bullet because each has its strengths and weaknesses, and as such there is value in leaning and using both. The igraph package approaches network analysis from the mathematics/physics/graph theoretic perspective, including several advanced metrics and random graph models. In contrast, Statnet was primarily designed for social science, and its primary advantage is the inclusion of a series of functions for estimating and testing ERGM/p* graph models.

    Download igraph
    Download Statnet

  4. plyr

    Also brought to you by R guru Hadley Wickham, the plyr package assist reachers in the least glamorous aspect of their work—data manipulation and cleaning. One of R’s great advantages is its ability t handle very large datasets, and plyr is there to help you break these large data problems into smaller and more manageable pieces.

    Download plyr

  5. Amelia II

    Also developed by Gary Kind, Amelia II contains a sets of algorithms for multiple imputation of missing data across a wide range of data types, such as survey, time series and cross sectional. As missing data problems are ubiquitous in social science research the functions contained in this package provide a powerful solution to these issues.

    Download Amelia II

  6. nlme

    This package is used to fit and compare Gaussian linear and nonlinear mixed-effects models. For those examining complex time series data with various correlation structures the nlme provides a number of options for fits, tests and plotting.

    Download nlme

  7. SNOW/Rmpi

    Unlike newer version of Python, the current build of R does not contain native functionality for distributing jobs across high-performance computing clusters. The SNOW and Rmpi packages provide this functionality, and are highly recommended to any researcher with access to an HPC environment running R.

    Download SNOW
    Download Rmpi

  8. xtable/apsrtable

    Both of these packages convert R summary results into LaTeX/HTML table format. The xtable package is a general solution, while the apsrtable package, developed by fellow political science grad student Michael Malecki, will output tables in the APSR format&mdas;for those of you fortunate enough to need to use this format.

  9. Download xtable
    Download apsrtable

  10. plm

    Got panel data? If so, you need plm, which contains all of the necessary model specifications and tests for fitting a panel data model; including specifications for instrumental variable models.

    Download plm

  11. sqldf

    As I stated, R is great for dealing with large datasets; however, occasionally you will encounter a dataset so large that it can grind R’s base I/O functions to a halt. As the name suggests, the sqldf packages overcomes this by allowing uses to perfrom SQL statements directly on R data frames, greatly increasing efficiency.

    Download sqldf

I hope that you will explore and use the packages above that you do not already have familiarity with. To those who have never used R and/or have an irrational phobia of the language, let this list provide the appropriate motivation. Also, to those R experts out there, I welcome any suggestions for more useful R packages for the social science inclined!


Automatically Generated Related posts:

  1. UPDATED: Must-Have Python Packages for Social Scientists
  2. SNA in R Talk, Updated with [Better] Video
  3. Slate on How Social Ties Helped Capture Saddam
  4. New Issue of Social Networks

17 comments to Must-Have R Packages for Social Scientists

  • [...] (Video)deliciousThe Secret Diary of Steve Jobs : A not-so-brief chat with Randall Stephenson of AT&TZero Intelligence Agents Must-Have R Packages for Social ScientistsMac Gems of the Year (2009) Review | Software | Mac Gems | MacworldWindows 7 USB Download Tool Lets [...]

  • David

    very interesting and useful- though I have never really understood the utility of zelig. Also, Amelia II over MICE or mi? I’ve used them all on large datasets with high rates of missingness, and i like mi best, MICE second, and Amelia II never finished running. built-in graphical diagnostics in mi are indispensable.

    I really didn’t mean to pick on the King libraries- but I have always found something that worked a little better.

    [Reply]

    Drew Conway Reply:

    I am partial to the King libraries because they have a uniform syntax across all model types, which reduces the amount of time I have to spend in a manual.

    I have never tinkered with MICE or MI, but after this post several people have mentioned their superiority, so I will have to give them a try.

    Thanks!

    [Reply]

  • [...] I am reading Must-Have R Packages for Social ScientistsRah Price ManipulatorsVisualizing the Structure of Venture Capital Co-InvestmentsWhy to be Cautious [...]

  • Great list.
    Four others that I use a lot are:
    1. psych (descriptive statistics, scoring tests, reliability analysis, pairs.panels function, good documentation)
    2. Hmisc (describe function, regression functions, and more)
    3. debug (debugging)
    4. car (in particular car has a useful recode function)

    [Reply]

    Drew Conway Reply:

    The best part about making these list is I end up learning about other great libraries–thank you!

    [Reply]

  • Good list. I assumed Sweave would be in there but maybe that’s more a “view handler” than a package.

    [Reply]

    Drew Conway Reply:

    Exactly, SWeave is great, but since it is in a way its own separate functionality and interface to LaTeX I didn’t include it.

    [Reply]

  • Kevin Wright

    I would have included “reshape” as essential.

    When I recently tried ggplot2, it didn’t handle missing data very well. It has some other peculiarities such as a fondness for HCL colors (theoretically good for polygons, but in practice poor for scatter plots) and only recently learned to spell “color” the same way that all other R packages do.

    [Reply]

    Drew Conway Reply:

    Indeed, reshape is also in the repertoire so I am glad you mention it. This is like trying to choose among children, and I wanted to provide adequate breadth of library functions.

    As for the oddities of ggplot2, I agree. I haven’t had any issues with missing data (care to elaborate), but I agree that the default color scheme is bad. In fact, I was informed that the default is quite problematic for the color blind, especially those with red-green color blindness. Of course, it is straightforward to change the default color scheme with opts().

    As for the spelling, Hadley has shortcuts for “color” for almost all functions in the package, but give him a break, he’s a Kiwi!

    [Reply]

  • [...] Conway has a great list of 10 must-have R packages for social scientists. If you’re a social scientist (or really, any kind of scientist) who doesn’t use R, now [...]

  • Another useful package is Martin Elff’s memisc-package. Martin describes the aims of his package as follows: “One of the aims of this package is to make life easier for useRs who deal with survey data sets. It provides an infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) SPSS and Stata files.”

    [Reply]

  • Carlo Cosenza

    great list. thank you.

    I’d add hash and proto to the list of useful functions. hash is a hashed list. proto is an alternative to the nasty S3 and S4 class implementations.

    [Reply]

    Carlo Cosenza Reply:

    errata: useful packages.

    [Reply]

  • Zach Thomas

    I’m looking for the best packages that can do the following:
    1. MCMC simulations
    2. Bayesian modeling
    3. PLSR
    4. Shapley Value Regression
    5. Variable Importance Assessment in Regression
    6. CHAID
    7. SEM

    Zach

    [Reply]

  • @ Zach
    SEM: Check out OpenMx and sem
    variable importance: relaimpo looks good: http://prof.beuth-hochschule.de/groemping/relaimpo/

    [Reply]

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Technorati Profile