How to import Qualtrics csv files into R

Overview

Have you ever struggled with importing data from Qualtrics into R? It can be a pain and was actually putting me off from starting to learn R. What a shame! Here’s a brief tutorial to save others from the same frustration.

Two approaches to get your data into R

There are two approaches to import Qualtrics data into R:

  1. You use a browser, access your Qualtrics account and the “Data & Analysis” tab; then you download the data in a format of your choice (e.g., .csv = comma separated values; .txt = plain text; .sav = standard format for SPSS). Then you import the data into R. Keep on reading to learn how.

  2. You use Qualtric’s API to access your account from R. The data are directly loaded into R. Note that Qualtric’s API is not available for all institutions / accounts. This approach won’t be covered here (for more details, follow these instructions).

.csv data

Many researchers prefer the .csv format because it is supported (i.e., readable) by various applications. This facilitates sharing the data with others as well as your future self. In other words, a .csv file guarantees that you will be able to access your data in the future, regardless of whether your Qualtrics account still exists or not. Moreover, it will be easy to go back to your raw data at any time. In my opinion, having a simple text file as your starting point makes your research easier to replicate and more transparent.

Qualtrics and R

Unfortunately, importing .csv data from Qualtrics has been rather complicated until recently. In what follows, I briefly explain why importing Qualtrics’s .csv files into R causes problems and how you can make it work very easily using a function from the package qualtRics.

The .csv files that Qualtrics generates are not very user-friendly when you work with R. There are two things that are actually quite annoying for R users:

  • Qualtrics generates .csv files that include more rows than .csv typically include. These extra rows cause errors when importing data into R. More specifically, .csv files conventionally include either a first row that specifies the names of the columns (e.g., gender, age, etc.) or they directly start with the data. Qualtrics adds extra rows at the top of the data file that include more detailed information about the variables. By adding a second (when using the Qualtrics legacy export option) or even a third row (when using Qualtrics’s current standard way of exporting .csv data), Qualtrics’s .csv files cannot be read properly by standard import functions in R (e.g., read.csv() or readr::read_csv()). (R interprets the whole dataframe as text / string because the second row includes characters rather than numbers).

  • Qualtrics adds a lot of information to some types of questions that make it difficult to recognize the actual content of questions. For example, if you have a matrix question (i.e., a tabular display that uses the same answer options for several items) in which you ask participants to indicate how much they like different colours, then the second row in the .csv output will repeat the introductory text as a prefix for every single item that is part of that matrix question. That is, the second row will include very long variable names (e.g., “How much do you like the following colours? Please indicate the extent to which you like them.–green” rather than simply “green”). These long names make it difficult to find the information that you are actually interested in (here: “green”).

A function that makes importing Qualtrics’s .csv files into R easy

Some time ago I contributed to a function that imports .csv from Qualtrics effortlessly into R and at the same time automatically removes the repetitive text in the variable labels (i.e., you get variable labels that only contain the actual content of the items – green, blue, and black when you ask about colour preferences). Fortunately, this function has been integrated in the qualtRics package. To use the function, you first need to install the package:

install.packages("qualtRics") 

Then you have two options to import .csv files from Qualtrics.

If you are importing a current .csv file from Qualtrics that includes three rows at the top with information about the variables (i.e., the headers and two rows with information about the variable labels), then run the following lines of code:

library(qualtRics)
raw_data <- read_survey("qualtrics_survey.csv")

If you are importing an older .csv file from Qualtrics that includes two rows at the top with information about the variables (i.e., the headers and the variable labels) or if you are using Qualtrics’s legacy export option to export your data (apparently this option won’t be available much longer), then run the following lines of code:

library(qualtRics)
raw_data <- read_survey("qualtrics_survey_legacy.csv", legacy = TRUE)

Note that the imported dataframes will have different numbers of variables. I think this is because importing the more current file type removes more irrelevant variables created by Qualtrics.

Importing Qualtrics’s files without the package qualtRics

You can, of course, also import .csv files from Qualtrics without using the qualtRics package. There are basically three ways to achieve this:

Keep it simple and import without problematic rows at the top (but accept that you lose the variable names)

Skip the first two (or three) rows when importing the files into R:

raw_data <- read.csv("qualtrics_survey.csv", skip=3) # more recent version of Qualtrics' .csv files
raw_data <- read.csv("qualtrics_survey_legacy.csv", skip=2) # legacy format 

Of course, you can also open the .csv file in Excel (or in a similar application) and delete rows 1 and 2 (and, if present, row 3) by hand. Then save and import the .csv without problems into R.This is not very elegant but does the job.

Make a detour and keep variable names

Alternatively, if you want to keep the row that includes the names of the variables (i.e., the header), you can separateley import the headers and the data. Here’s a post (https://stackoverflow.com/questions/15860071/read-csv-header-on-first-line-skip-second-line) that explains how to do this. For a solution using dplyr see this post (https://stackoverflow.com/questions/44273235/how-to-skip-second-line-is-csv-file-while-maintaining-first-line-as-column-names).

Again, another option is to use Excel and simply delete rows 2 (and, if necessary, row 3), save, and import the .csv without problems into R.

Yet another alternative is to import the .csv file as it comes and then later remove the second row and transform the variables that should be numeric from character strings into numeric variables (see here: https://blogs.uoregon.edu/rclub/2016/05/18/bringing-in-qualtrics-and-other-data/).

Use a different file format such as SPSS

Yet another option would be to export your data as SPSS files (.sav) and then import the data using the R package haven or foreign.

Related