tidytable
is a data frame manipulation library for users who need data.table
speed but prefer tidyverse
-like syntax.
Install the released version from CRAN with:
install.packages("tidytable")
Or install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("markfairbanks/tidytable")
tidytable
replicates tidyverse
syntax but uses data.table
in the background. In general you can simply use library(tidytable)
to replace your existing dplyr
and tidyr
code with the faster tidytable
equivalents.
A full list of implemented functions can be found here.
library(tidytable)
df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df %>%
select(x, y, z) %>%
filter(x < 4, y > 1) %>%
arrange(x, y) %>%
mutate(double_x = x * 2,
x_plus_y = x + y)
#> # A tidytable: 3 × 5
#> x y z double_x x_plus_y
#> <int> <int> <chr> <dbl> <int>
#> 1 1 4 a 2 5
#> 2 2 5 a 4 7
#> 3 3 6 b 6 9
You can use the normal tidyverse
group_by()
/ungroup()
workflow, or you can use .by
syntax to reduce typing. Using .by
in a function is shorthand for df %>% group_by() %>% fn() %>% ungroup()
.
.by = z
.by = c(y, z)
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)
df %>%
summarize(avg_z = mean(z),
.by = c(x, y))
#> # A tidytable: 2 × 3
#> x y avg_z
#> <chr> <chr> <dbl>
#> 1 a a 1.5
#> 2 b b 3
All functions that can operate by group have a .by
argument built in. (mutate()
, filter()
, summarize()
, etc.)
The above syntax is equivalent to:
df %>%
group_by(x, y) %>%
summarize(avg_z = mean(z)) %>%
ungroup()
#> # A tidytable: 2 × 3
#> x y avg_z
#> <chr> <chr> <dbl>
#> 1 a a 1.5
#> 2 b b 3
Both options are available for users, so you can use the syntax that you prefer.
tidytable
allows you to select/drop columns just like you would in the tidyverse by utilizing the tidyselect
package in the background.
Normal selection can be mixed with all tidyselect
helpers: everything()
, starts_with()
, ends_with()
, any_of()
, where()
, etc.
df <- data.table(
a = 1:3,
b1 = 4:6,
b2 = 7:9,
c = c("a", "a", "b")
)
df %>%
select(a, starts_with("b"))
#> # A tidytable: 3 × 3
#> a b1 b2
#> <int> <int> <int>
#> 1 1 4 7
#> 2 2 5 8
#> 3 3 6 9
A full overview of selection options can be found here.
Tidy evaluation can be used to write custom functions with tidytable
functions. The embracing shortcut {{ }}
works, or you can use enquo()
with !!
if you prefer:
df <- data.table(x = c(1, 1, 1), y = 4:6, z = c("a", "a", "b"))
add_one <- function(data, add_col) {
data %>%
mutate(new_col = {{ add_col }} + 1)
}
df %>%
add_one(x)
#> # A tidytable: 3 × 4
#> x y z new_col
#> <dbl> <int> <chr> <dbl>
#> 1 1 4 a 2
#> 2 1 5 a 2
#> 3 1 6 b 2
The .data
and .env
pronouns also work within tidytable
functions:
var <- 10
df %>%
mutate(new_col = .data$x + .env$var)
#> # A tidytable: 3 × 4
#> x y z new_col
#> <dbl> <int> <chr> <dbl>
#> 1 1 4 a 11
#> 2 1 5 a 11
#> 3 1 6 b 11
A full overview of tidy evaluation can be found here.
dt()
helper
The dt()
function makes regular data.table
syntax pipeable, so you can easily mix tidytable
syntax with data.table
syntax:
For those interested in performance, speed comparisons can be found here.
verb.()
syntax
For backwards compatibility tidytable
exports verb.()
versions of functions. This will also allow users to more easily combine dplyr
and tidytable
functions in one script:
df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df %>%
mutate.(double_x = x * 2)
#> # A tidytable: 3 × 4
#> x y z double_x
#> <int> <int> <chr> <dbl>
#> 1 1 4 a 2
#> 2 2 5 a 4
#> 3 3 6 b 6
tidytable
is only possible because of the great contributions to R by the data.table
and tidyverse
teams. data.table
is used as the main data frame engine in the background, while tidyverse
packages like rlang
, vctrs
, and tidyselect
are heavily relied upon to give users an experience similar to dplyr
and tidyr
.