Reproducible Data Analysis Examples

I'm a big fan of having reproducible workflows, especially when it comes to reproducible data analysis for my research projects. A really handy tool is R Markdown, which allows me to do all my analyses in R and then format the output (including programatically-generated tables and numbers!) via Markdown into a nice HTML or PDF document.

Many journals now require uploading data and analysis code to some online repository like OSF or ICPSR. One issue is that these repositories are functionally simple: they just offer (persistent, time-stamped) storage for data and code, but do not offer any interactivity. What I prefer to do is, in addition, to use GitHub Pages to render the output HTML file that accompany my projects. This results in a visually appealing (and reproducible) presentation of the main results.

Below I list some examples:

  • [ HTML ] OSF Repository for: Chen et al. (2022) Real-World Effectiveness of a Social-Psychological Intervention Translated from Controlled Trials to Classrooms. npj Science of Learning.
  • [ HTML ] ICPSR Repository [or GitHub] for: Chen, P.*, Ong, D. C.*, Ng, J., & Coppola, B. P. (2021). Explore, Exploit, and Prune in the Classroom: Strategic Resource Management Behaviors Predict Performance. AERA Open.
  • [ HTML ] OSF Repository for: Chen, P., Chavez, O., Ong, D. C., & Gunderson, B. (2017). Strategic Resource Use for Learning: A Self-administered Intervention that Guides Effective Resource Use Enhances Academic Performance. Psychological Science.


  • [ HTML ] Github Repository for: Ong, Goodman, & Zaki (2018). Happier than thou? A self-enhancement bias in emotion attribution. Emotion.
  • [ HTML ] Github Repository for: Ong, Zaki, & Gruber (2017). Increased cooperative behavior across remitted bipolar I disorder and major depression: Insights utilizing a behavioral economic trust game. Journal of Abnormal Psychology.
  • [ HTML ] Github Repository for: Ong, Zaki, & Goodman (2015). Affective Cognition: Exploring lay theories of emotion. Cognition.

Below is a list of useful hacks that I've coded. Most of them are on my Github page.
#pin: Bonus workers on Amazon Mechanical Turk

I wrote a simple python script to automate awarding bonuses to workers on Mechanical Turk (mTurK). You can find it here. It requires Amazon Command Line Tools.

Programatically downloading .csv data from Qualtrics and reading it into R

I wrote an R function to do this here. The readme on that page also has the plain cURL code and links to the Qualtrics API documentation so you can experiment on your own!

Continuous version of the Inclusion of Other in Self (IOS) scale

A simple Javascript implementation (using Raphael.js) of a continuous version of the Inclusion of Other in Self scale (Aron, Aron, & Smollan, 1982).

One-line redaction of identifiable information in R

Simple one liner to replace identifiable information (e.g. mTurk workerids). (Has some cons, but it's fast.)
d0$workerid_random <- match(d0$workerid, unique(sort(d0$workerid)))

If you want to be even cooler and use a one-way hash function, here's a one-liner using the digest() function (from the digest package):

library(digest); d0$workerid_hash <- substr(sapply(as.character(d0$workerid), digest, algo="md5", serialize=F), 1, 6)

(the code above: (1) converts workerid into a character string, (2) uses sapply() to apply digest() vectorially, (3) takes the first 6 characters of the resulting string.)

colorMeText

During one of our lab hackathons, Justine Kao, Greg Scontras, and I coded up a little interactive web text-visualization demo: colorMeText, which basically colors input text according to ratings using some dictionary (e.g. useful for sentiment analysis, or any other dimension of interest). It's still a work in progress!


The following are useful hacks that I've collected from various places around the internet.
(Painlessly) updating R

A neat and simple trick to update R and re-install all your packages here.

Data Munging: Concatenate many (e.g. .csv) files in Terminal.

Sometimes you just want to concatenate all the data files in a directory into one big file. If it's in a format like .csv, and you want to skip the first header, you can use the following command in Terminal:
awk 'FNR > 1' *.csv > combined_file.csv

Compressing PDFs on a Mac

Although Preview on a Mac can compress (Export->Quartz Filter->Reduce File Size), the images become really low quality. The solution I used was pdfcompress.com, which provided reasonable results.

GIF making

I use GIFFun, which is pretty alright if you just need the basic essentials.
There's also a neat guide here that I haven't tried, using ImageMagick on the command line.

Destructively crop pdfs (useful for LaTeX-ing)

If you crop a pdf file in Preview, it doesn't destructively crop it. The parts you cropped out are still hidden in the file (i.e. so you can undo cropping). I've found that this gives trouble with LaTeX when it doesn't recognize the bounding boxes. If you need to destructively crop the file, one way to do it is using Ghostscript. Let's say you want to crop "in.pdf" to "out.pdf" (note that you can't use the same filename, because of the way gs works), at the command line, type:
gs -sDEVICE=pdfwrite -dUseCropBox -sOutputFile=out.pdf - < in.pdf

Convert Files to EPS (useful for LaTeX-ing)

Reference: http://electron.mit.edu/~gsteele/pdf/

[Postscript] Level 1 uses only ascii-coded RGB values, and is very wasteful, producing very large files. Level 2 includes support for JPEG encoded images, which produces much smaller files. Level 3 includes support for Zlib compression, making it well suited for making EPS files from png files.

In general, level 3 will produce the smallest files. Level 2 provides the best compatibility, and works well with jpeg images.

If you decide to use level 2 postscript, I recommend converting first to a jpg file. The "convert" program included Imagemagick uses a quality factor in "percent" that ranges from 0 to 100:

convert -quality 80 fig.png fig.jpg

I find a quality factor of 80 on high resolution images gives good compresssion without too much loss in quality. You can then to convert the image to eps using "convert" with the eps2 settings:

convert fig.jpg eps2:fig.eps

If you can use level 3 postscript, you can convert directly from png to eps:

convert fig.png eps3:fig.eps

Using level 3 postscript from a png image file for scientific figures will often produce a very small eps file. Ghostscript is compatible with these level 3 eps files, so this is often a good way to go.

Set Default Zoom in MS Word

Here's a simple Macro that you can use so that everytime Microsoft Word opens a new document, it does so at a specific zoom level. (Personally, I like 100% on my Retina Pro.).
1) In Word, Go to Tools->Macro->Macros.
2) On the dropbox after "Macros in", click Normal (Global Template)
3) Create a new Macro called AutoOpen [This particular name seems to be required for it to be run upon opening].
4) Paste the following macro in, where 100 is the desired zoom percentage.
Sub AutoOpen()
    ActiveWindow.ActivePane.View.Zoom.Percentage = 100
End Sub
(refs: various places like this and this.)

Increase Mouse Sensitivity on Mac OS X

I love having really high sensitivity on my mouse/trackpad. Unfortunately, the maximum that you can go in System Preferences isn't high enough for me. There is a way to increase this sensitivity further. In Terminal, typing:
defaults read -g com.apple.mouse.scaling
will give you the current value of your mouse scaling. You can modify it by changing read to write. For example, if you want to set your mouse scaling to 3.0 (the maximum in System Preferences), type:
defaults write -g com.apple.mouse.scaling 3.0
In addition, to change the trackpad, use com.apple.trackpad.scaling. You can also use .scrolling to change scrolling speed.