Quick Overview
readr is an R package that provides a fast and friendly way to read rectangular data (like csv, tsv, and fwf). It is designed to flexibly parse many types of data found in the wild, while still being easy to use. readr is part of the tidyverse, an ecosystem of packages designed with common APIs and a shared philosophy.
Pros
- Fast performance, especially for large datasets
- Consistent and intuitive API that works well with other tidyverse packages
- Automatically parses common data types (e.g., dates, numbers)
- Provides informative error messages and warnings for data parsing issues
Cons
- May struggle with very complex or inconsistent data formats
- Some users find the default behavior for handling missing values confusing
- Limited support for writing data compared to its reading capabilities
- Occasional discrepancies between how it parses data vs. base R functions
Code Examples
Reading a CSV file:
library(readr)
data <- read_csv("path/to/file.csv")
Reading a fixed-width file:
fixed_data <- read_fwf("path/to/fixed_width_file.txt",
col_positions = fwf_widths(c(10, 5, 8),
col_names = c("name", "age", "date")))
Parsing a string as a specific data type:
parse_number("$1,234.56")
# Returns: 1234.56
Getting Started
To get started with readr, first install and load the package:
install.packages("readr")
library(readr)
# Read a CSV file
data <- read_csv("path/to/your/file.csv")
# View the first few rows
head(data)
# Get a summary of the data
glimpse(data)
# Write data to a CSV file
write_csv(data, "path/to/output.csv")
This will install readr, load it, read a CSV file, display the first few rows, provide a summary of the data structure, and then write the data back to a CSV file. Adjust the file paths as needed for your specific use case.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
readr 
Overview
The goal of readr is to provide a fast and friendly way to read rectangular data from delimited files, such as comma-separated values (CSV) and tab-separated values (TSV). It is designed to parse many types of data found in the wild, while providing an informative problem report when parsing leads to unexpected results. If you are new to readr, the best place to start is the data import chapter in R for Data Science.
Installation
# The easiest way to get readr is to install the whole tidyverse:
install.packages("tidyverse")
# Alternatively, install just readr:
install.packages("readr")
# Or you can install the development version from GitHub:
# install.packages("pak")
pak::pak("tidyverse/readr")
Cheatsheet
Usage
readr is part of the core tidyverse, so you can load it with:
library(tidyverse)
#> ââ Attaching core tidyverse packages ââââââââââââââââââââââââ tidyverse 2.0.0 ââ
#> â dplyr 1.1.4 â readr 2.1.5.9000
#> â forcats 1.0.0 â stringr 1.5.1
#> â ggplot2 3.5.1 â tibble 3.2.1
#> â lubridate 1.9.3 â tidyr 1.3.1
#> â purrr 1.0.2
#> ââ Conflicts ââââââââââââââââââââââââââââââââââââââââââ tidyverse_conflicts() ââ
#> â dplyr::filter() masks stats::filter()
#> â dplyr::lag() masks stats::lag()
#> â¹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Of course, you can also load readr as an individual package:
library(readr)
To read a rectangular dataset with readr, you combine two pieces: a function that parses the lines of the file into individual fields and a column specification.
readr supports the following file formats with these read_*()
functions:
read_csv()
: comma-separated values (CSV)read_tsv()
: tab-separated values (TSV)read_csv2()
: semicolon-separated values with,
as the decimal markread_delim()
: delimited files (CSV and TSV are important special cases)read_fwf()
: fixed-width filesread_table()
: whitespace-separated filesread_log()
: web log files
A column specification describes how each column should be converted
from a character vector to a specific data type (e.g. character,
numeric, datetime, etc.). In the absence of a column specification,
readr will guess column types from the data. vignette("column-types")
gives more detail on how readr guesses the column types. Column type
guessing is very handy, especially during data exploration, but itâs
important to remember these are just guesses. As any data analysis
project matures past the exploratory phase, the best strategy is to
provide explicit column types.
The following example loads a sample file bundled with readr and guesses the column types:
(chickens <- read_csv(readr_example("chickens.csv")))
#> Rows: 5 Columns: 4
#> ââ Column specification ââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
#> Delimiter: ","
#> chr (3): chicken, sex, motto
#> dbl (1): eggs_laid
#>
#> â¹ Use `spec()` to retrieve the full column specification for this data.
#> â¹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 5 Ã 4
#> chicken sex eggs_laid motto
#> <chr> <chr> <dbl> <chr>
#> 1 Foghorn Leghorn rooster 0 That's a joke, ah say, that's a jokâ¦
#> 2 Chicken Little hen 3 The sky is falling!
#> 3 Ginger hen 12 Listen. We'll either die free chickâ¦
#> 4 Camilla the Chicken hen 7 Bawk, buck, ba-gawk.
#> 5 Ernie The Giant Chicken rooster 0 Put Captain Solo in the cargo hold.
Note that readr prints the column types â the guessed column types, in
this case. This is useful because it allows you to check that the
columns have been read in as you expect. If they havenât, that means you
need to provide the column specification. This sounds like a lot of
trouble, but luckily readr affords a nice workflow for this. Use
spec()
to retrieve the (guessed) column specification from your
initial effort.
spec(chickens)
#> cols(
#> chicken = col_character(),
#> sex = col_character(),
#> eggs_laid = col_double(),
#> motto = col_character()
#> )
Now you can copy, paste, and tweak this, to create a more explicit readr
call that expresses the desired column types. Here we express that sex
should be a factor with levels rooster
and hen
, in that order, and
that eggs_laid
should be integer.
chickens <- read_csv(
readr_example("chickens.csv"),
col_types = cols(
chicken = col_character(),
sex = col_factor(levels = c("rooster", "hen")),
eggs_laid = col_integer(),
motto = col_character()
)
)
chickens
#> # A tibble: 5 Ã 4
#> chicken sex eggs_laid motto
#> <chr> <fct> <int> <chr>
#> 1 Foghorn Leghorn rooster 0 That's a joke, ah say, that's a jokâ¦
#> 2 Chicken Little hen 3 The sky is falling!
#> 3 Ginger hen 12 Listen. We'll either die free chickâ¦
#> 4 Camilla the Chicken hen 7 Bawk, buck, ba-gawk.
#> 5 Ernie The Giant Chicken rooster 0 Put Captain Solo in the cargo hold.
vignette("readr")
gives an expanded introduction to readr.
Editions
readr got a new parsing engine in version 2.0.0 (released July 2021). In
this so-called second edition, readr calls vroom::vroom()
, by default.
The parsing engine in readr versions prior to 2.0.0 is now called the
first edition. If youâre using readr >= 2.0.0, you can still access
first edition parsing via the functions with_edition(1, ...)
and
local_edition(1)
. And, obviously, if youâre using readr < 2.0.0, you
will get first edition parsing, by definition, because thatâs all there
is.
We will continue to support the first edition for a number of releases,
but the overall goal is to make the second edition uniformly better than
the first. Therefore the plan is to eventually deprecate and then remove
the first edition code. New code and actively-maintained code should use
the second edition. The workarounds with_edition(1, ...)
and
local_edition(1)
are offered as a pragmatic way to patch up legacy
code or as a temporary solution for infelicities identified as the
second edition matures.
Alternatives
There are two main alternatives to readr: base R and data.tableâs
fread()
. The most important differences are discussed below.
Base R
Compared to the corresponding base functions, readr functions:
-
Use a consistent naming scheme for the parameters (e.g.Â
col_names
andcol_types
notheader
andcolClasses
). -
Are generally much faster (up to 10x-100x) depending on the dataset.
-
Leave strings as is by default, and automatically parse common date/time formats.
-
Have a helpful progress bar if loading is going to take a while.
-
All functions work exactly the same way regardless of the current locale. To override the US-centric defaults, use
locale()
.
data.table and fread()
data.table has a function
similar to read_csv()
called fread()
. Compared to fread()
, readr
functions:
-
Are sometimes slower, particularly on numeric heavy data.
-
Can automatically guess some parameters, but basically encourage explicit specification of, e.g., the delimiter, skipped rows, and the header row.
-
Follow tidyverse-wide conventions, such as returning a tibble, a standard approach for column name repair, and a common mini-language for column selection.
Acknowledgements
Thanks to:
-
Joe Cheng for showing me the beauty of deterministic finite automata for parsing, and for teaching me why I should write a tokenizer.
-
JJ Allaire for helping me come up with a design that makes very few copies, and is easy to extend.
-
Dirk Eddelbuettel for coming up with the name!
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot