Quickly create descriptive tables in R for MS Word and Excel
Presentation of R packages to generate descriptive tables and save them as Word files
Overview
Most reports of empirical work require a table that describes the study sample (e.g., people, animals, organizations). Gathering and aggregating the relevant information can be tedious. To illustrate, different types of variables require different statistics: Ratio and interval scale variables (e.g., age, test score) are best summarized by their mean and standard deviation, while nominal and ordinal data (e.g., gender, level of education) are better summarized by frequencies and percentages. Preparing statistics for different subgroups (e.g., experimental conditions or administrative regions) may entail additional work (see here and here for some tips on how to prepare descriptive tables manually; useful functions to prepare multiple summary statistics in one step include rstatix::get_summary_stats
, psych::describe
, Hmisc::describe
, and DescrTab2::descr
for ratio and interval data, and janitor::tabyl
for nominal and ordinal data).
Fortunately, there are R packages
that facilitate creating such tables. The packages differ in terms of portability:
- some only show the results in the console
- others are intended for creating PDF files
- still others can save tables directly as Word or Excel files
Producing tables as Word or Excel files is very convenient when collaborating with others. Another advantage is that the process of creating the tables is automated, transparent, and replicable.
In what follows, I present a couple of packages that require only little coding, automatically prepare the appropriate statistics, and offer great portability (i.e., save tables as Word or Excel files while keeping as much of the formatting as possible). To see how the Word files look, just click on the package’s name:
- crosstable
- gtsummary
- arsenal
- table1
- tableone
- furniture
Preparation
Make sure all required packages are available and loaded:
if (!require("pacman")) install.packages("pacman")
pacman::p_load(
"httr", # for downloading data
"sjlabelled", # for working with item labels (retrieving their content)
"labelled", # for with labelled variables
"crosstable", # create table
"gtsummary", # create table
"arsenal", # create table
"table1", # create table
"tableone", # create table
"furniture", # create table
"kableExtra", # for formatting tables
"openxlsx", # to save table as Excel file
"flextable", # for formatting and exporting tables to word
"officer", # for exporting tables to word
"tidyverse") # data handling & cleaning
To be able to replicate the code and create the table, you can download the sample data from a previous study directly from OSF:
url <- 'https://osf.io/62ya5//?action=download'
filename <- 'df_study1.RData'
httr::GET(url, httr::write_disk(filename, overwrite = TRUE))
load("df_study1.RData")
# keep only relevant variables
df <- df %>%
# keep:
select("age","gender","student","condition")
Most of the packages introduced below automatically check the scale level of variables and then compute the appropriate statistic (e.g., frequencies for ordinal data). For this to work properly, it is essential that the “type” of the variables is correctly defined. In this dataset, some of the categorical variables need updating from type numeric
to factor
.
df <- df %>%
# convert to type factor
dplyr::mutate(df, across(c(condition,gender,student), as.factor))
Packages that save tables as Word / Excel files
The following packages can save formatted tables as Word files. One package (gtsummary
) can also directly save the table as an Excel file (there may be indirect ways for the other packages, too).
crosstable
Let’s start with crosstable
, the package that requires the least amount of code and has good default settings. In its simplest form, the code for this package couldn’t be shorter:
crosstable(df)
## # A tibble: 13 × 4
## .id label variable value
## <chr> <chr> <chr> <chr>
## 1 age age Min / Max 13.0 / 71.0
## 2 age age Med [IQR] 26.0 [24.0;29.0]
## 3 age age Mean (std) 28.8 (9.3)
## 4 age age N (NA) 377 (1)
## 5 gender gender 1 269 (71.16%)
## 6 gender gender 2 109 (28.84%)
## 7 student student 1 229 (61.07%)
## 8 student student 2 146 (38.93%)
## 9 student student NA 3
## 10 condition condition 1 94 (24.87%)
## 11 condition condition 2 91 (24.07%)
## 12 condition condition 3 101 (26.72%)
## 13 condition condition 4 92 (24.34%)
By default, crosstable
displays the table in the console. A nicer output (e.g., for websites or R Studio’s viewer) can be obtained via knitr::kable()
. Grouping the data by a column (here the experimental condition) is also easy.
crosstable(df, by = "condition") %>%
select(-".id") %>% # remove first column
# optimize formatting for web display (table looks better without next line)
knitr::kable()
label | variable | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
age | Min / Max | 17.0 / 62.0 | 17.0 / 54.0 | 13.0 / 61.0 | 19.0 / 71.0 |
age | Med [IQR] | 26.0 [24.0;29.0] | 26.0 [24.0;30.5] | 26.0 [24.0;30.0] | 25.0 [23.0;28.2] |
age | Mean (std) | 29.1 (9.3) | 28.8 (8.3) | 29.0 (10.0) | 28.2 (9.6) |
age | N (NA) | 94 (0) | 90 (1) | 101 (0) | 92 (0) |
gender | 1 | 63 (23.42%) | 68 (25.28%) | 72 (26.77%) | 66 (24.54%) |
gender | 2 | 31 (28.44%) | 23 (21.10%) | 29 (26.61%) | 26 (23.85%) |
student | 1 | 57 (24.89%) | 57 (24.89%) | 59 (25.76%) | 56 (24.45%) |
student | 2 | 36 (24.66%) | 34 (23.29%) | 40 (27.40%) | 36 (24.66%) |
student | NA | 1 | 0 | 2 | 0 |
With the following code, it is also possible to save the table as a Word file (see here and here).
# Specify properties for the target document
library(officer)
sect_properties <- prop_section(
page_size(width = 21, height = 29.7, orient = "portrait"),
type = "continuous",
page_margins = page_mar())
# create table
df %>%
crosstable::crosstable(by = "condition") %>%
# needs to be a flex_table
crosstable::as_flextable() %>%
# use flextable package to save table as word
flextable::save_as_docx(path = "descriptives_crosstable.docx", pr_section = sect_properties)
# for more details about last step, see https://davidgohel.github.io/flextable/reference/save_as_docx.html
gtsummary
There is also a very quick way to get a descriptive table with the package gtsummary
:
df %>%
gtsummary::tbl_summary() %>%
# for nicer display on website
gtsummary::as_kable()
Characteristic | N = 378 |
---|---|
age | 26 (24, 29) |
Unknown | 1 |
gender | |
1 | 269 (71%) |
2 | 109 (29%) |
student | |
1 | 229 (61%) |
2 | 146 (39%) |
Unknown | 3 |
condition | |
1 | 94 (25%) |
2 | 91 (24%) |
3 | 101 (27%) |
4 | 92 (24%) |
The information for the categorical / ordinal variables looks fine (frequencies and percentages). However, for interval scale variables such as age I’d rather have the mean and standard deviation than the median and the interquartile range. The “by” command can be used to summarize the data by group.
df %>%
gtsummary::tbl_summary(
by = condition,
statistic = list(age = "{mean} ({sd})")) %>%
# for nicer display on website
gtsummary::as_kable()
Characteristic | 1, N = 94 | 2, N = 91 | 3, N = 101 | 4, N = 92 |
---|---|---|---|---|
age | 29 (9) | 29 (8) | 29 (10) | 28 (10) |
Unknown | 0 | 1 | 0 | 0 |
gender | ||||
1 | 63 (67%) | 68 (75%) | 72 (71%) | 66 (72%) |
2 | 31 (33%) | 23 (25%) | 29 (29%) | 26 (28%) |
student | ||||
1 | 57 (61%) | 57 (63%) | 59 (60%) | 56 (61%) |
2 | 36 (39%) | 34 (37%) | 40 (40%) | 36 (39%) |
Unknown | 1 | 0 | 2 | 0 |
This is how gtsummary
saves tables as Word files:
# set documents properties
library(officer)
sect_properties <- prop_section(
page_size(width = 21, height = 29.7, orient = "portrait"),
type = "continuous",
page_margins = page_mar()
)
df %>%
gtsummary::tbl_summary(
by = condition,
statistic = list(age = "{mean} ({sd})")) %>%
# this changes the table to a different format that can be saved as Word
as_flex_table() %>%
# use flextable package to save table as word
flextable::save_as_docx(path = "descriptives_gtsummary.docx", pr_section = sect_properties)
# for more details about last step, see https://davidgohel.github.io/flextable/reference/save_as_docx.html
To the best of my knowledge, gtsummary
is the only package that can export tables as Excel files:
df %>%
gtsummary::tbl_summary(
by = condition,
statistic = list(age = "{mean} ({sd})"),
digits = all_continuous() ~ 2) %>%
# export to Excel
as_hux_xlsx("descriptives_gtsummary.xlsx")
arsenal
arsenal
also exports tables as Word files but requires a bit more coding. For instance, it is necessary to specify all the variables to include in the table. Note that by default arsenal
shows the total and performs statistical tests when comparing groups (this can be turned off, though).
# create table
arsenal::tableby(condition~ age + gender + student,
data = df,
total = FALSE, # don't show column with total numbers
test = FALSE) %>% # don't perform test for group comparisons
# display table
summary()
1 (N=94) | 2 (N=91) | 3 (N=101) | 4 (N=92) | |
---|---|---|---|---|
age | ||||
N-Miss | 0 | 1 | 0 | 0 |
Mean (SD) | 29.096 (9.275) | 28.756 (8.305) | 28.980 (9.971) | 28.250 (9.587) |
Range | 17.000 - 62.000 | 17.000 - 54.000 | 13.000 - 61.000 | 19.000 - 71.000 |
gender | ||||
1 | 63 (67.0%) | 68 (74.7%) | 72 (71.3%) | 66 (71.7%) |
2 | 31 (33.0%) | 23 (25.3%) | 29 (28.7%) | 26 (28.3%) |
student | ||||
N-Miss | 1 | 0 | 2 | 0 |
1 | 57 (61.3%) | 57 (62.6%) | 59 (59.6%) | 56 (60.9%) |
2 | 36 (38.7%) | 34 (37.4%) | 40 (40.4%) | 36 (39.1%) |
Saving the table as word file should be simple. However, for reasons unknown to me this doesn’t always work on my computer.
# save table as Word document
descriptives <- arsenal::tableby(condition~ age + gender + student, data = df)
arsenal::write2word(descriptives, "descriptives_arsenal.docx", title = "My table",
quiet = TRUE) # to suppress the command line output
table1
table1
is another package that requires only little code to create a descriptive table. Group-comparisons are easy (specify grouping variable with “|
”). Unfortunately, I couldn’t figure out how to render the table on the website.
# not working on server
table1::table1(~age + gender + student | condition, data = df)
table1
has its own function t1flex
to convert the table into a flextable
object, which can then be conveniently be saved as a Word file.
descriptive_table <- table1::table1(~age + gender + student | condition, data = df)
# convert to flextable and save
table1::t1flex(descriptive_table) %>%
flextable::save_as_docx(path="descriptives_table1.docx",pr_section = sect_properties) # see previous sections for how to define "sect_properties"
tableone
tableone
also requires only little code to create a descriptive table. strata
can be used to group the data by a column. By default, tableone
omits one level of categorical variables. If you want to see all levels, you need to change this manually (showAllLevels = TRUE
). Note that when comparing groups, the default is to also conduct a statistical test (this can be turned off, though).
tableone::CreateTableOne(vars = c("age", "gender", "student"),
data = df, strata = "condition",
test = FALSE) %>% # don't perform test for group comparison
print(showAllLevels = TRUE) %>%
# use kableone for nicer display
tableone::kableone()
level | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
n | 94 | 91 | 101 | 92 | |
age (mean (SD)) | 29.10 (9.27) | 28.76 (8.31) | 28.98 (9.97) | 28.25 (9.59) | |
gender (%) | 1 | 63 (67.0) | 68 (74.7) | 72 (71.3) | 66 (71.7) |
2 | 31 (33.0) | 23 (25.3) | 29 (28.7) | 26 (28.3) | |
student (%) | 1 | 57 (61.3) | 57 (62.6) | 59 (59.6) | 56 (60.9) |
2 | 36 (38.7) | 34 (37.4) | 40 (40.4) | 36 (39.1) |
tableone
doesn’t offer a direct way to export tables as Word files. However, with some additional code (partially taken from here) the table can be converted into a flextable
object and then be saved as a Word file.
(Note that tableone
can also prepare the table in a way that makes it easy to copy-and-paste to Excel (look for “quote = TRUE
” here and here.)
# save as word by using the print function and by converting the table to a flextable
descriptives_tableone <- tableone::CreateTableOne(data = df, strata = "condition")
descriptives_tableone <- print(descriptives_tableone,printToggle = FALSE)
descriptives_tableone %>%
as.data.frame() %>%
rownames_to_column("Variables") %>%
flextable() %>%
flextable::save_as_docx(path = "descriptives_tableone.docx", pr_section = sect_properties) # see previous sections for how to define "sect_properties"
furniture
Another package that quickly generates descriptive tables is furniture
.
furniture::table1(df, age, gender, student, splitby = ~condition) %>%
# for some reason this only works if the object is converted
kableExtra::kable()
. | 1 | 2 | 3 | 4 |
---|---|---|---|---|
n = 93 | n = 90 | n = 99 | n = 92 | |
age | ||||
29.1 (9.3) | 28.8 (8.3) | 28.9 (9.9) | 28.2 (9.6) | |
gender | ||||
1 | 62 (66.7%) | 68 (75.6%) | 71 (71.7%) | 66 (71.7%) |
2 | 31 (33.3%) | 22 (24.4%) | 28 (28.3%) | 26 (28.3%) |
student | ||||
1 | 57 (61.3%) | 56 (62.2%) | 59 (59.6%) | 56 (60.9%) |
2 | 36 (38.7%) | 34 (37.8%) | 40 (40.4%) | 36 (39.1%) |
When passed on to flextable
, tables can be saved as Word files.
furniture::table1(df, age, gender, student, splitby = ~condition) %>%
as.data.frame() %>%
flextable() %>%
flextable::save_as_docx(path="descriptives_furniture.docx",pr_section = sect_properties) # see previous sections for how to define "sect_properties"
Other interesting packages
There are a number of other packages that don’t offer the possibility to save tables as Word or Excel files but nonetheless generate nice, descriptive tables (see Table below). Most of them offer more manual ways to import tables into Word and Excel such as copy-and-pasting html tables, which is cool because it keeps the formatting (see here, here, or here). Another possibility is to save tables as text files (see e.g., here or here).
Package | Version | Output | Pipe1 | Comments |
---|---|---|---|---|
DescrTab2 | 2.1.16 | html, R Markdown, Word (if run as .Rmd file) | Yes | Many options to customize table |
summarytools | 1.0.1 | R Markdown, R Studio Viewer, html | Yes | Gives nice, detailed overview; may need installing addiational software |
tab::tabmulti | 5.1.1 | kable | No | Should work well with R Markdown and knitr documents; further processing and formatting should be possible (not tested) |
Gmisc | 3.0.0 | html and LaTeX | Yes | Can be passed on to htmlTable |
modelsummary::datasummary_balance | 1.1.0 | HTML, LaTeX, MS Word & Powerpoint, Text/Markdown, PDF, RTF, Image files. | No | No user-friendly way to quickly combine and nicely display stats for categorical and numeric variables |
vtable::sumtable | 1.4.1 | html, LaTeX | Yes | No user-friendly way to quickly combine and nicely display stats for categorical and numeric variables |
qwraps2 | 0.5.2 | R Markdown, LaTeX | ? | Highly customizable (e.g., ad-hoc merging of levels in categorical variables), requires more coding |
1 Does the package work with pipes (%>%
)?
Final words
There are really nice R packages
to create descriptive tables with little effort. Which one you find most useful may depend on criteria not covered in this post (e.g., how missing values are computed and displayed, how well the default statistics and formatting style corresponds to your disciplinary conventions).
If you find any errors, omissions, or believe additional packages should be included in this post, please let me know.