A few notes:
summarize() tests were performed on a
different dataset from case_when().setDTthreads(4) was used for data.table
& tidytable timings.data.table when being
compared to mutate.() &
dplyr::mutate()
fill.() & tidyr::fill() both work with
character/factor/logical columns, whereas
data.table::nafill() does not. Testing only included
numeric columns due to this constraint.dtplyr is missing timings for functions that are not
yet implemented in the package.pandas comparisons are in the process of being added -
more will be added soon.tidytable functions faster than
their data.table counterpart?
data.table in the background.tidytable runs were
slightly shorter on those specific functions on this iteration of the
tests. However one goal of these tests is to show that the “time cost”
of translating tidyverse syntax to data.table
is very negligible to the user (especially on medium-to-large
datasets).#> Date last run: 2022-08-01
#> # A tidytable: 13 × 6
#> func_tested data.table tidytable dtplyr tidyverse pandas
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 arrange 48.4 29.6 49.1 1644. 716.
#> 2 case_when 8.5 12.1 NA 90.5 64.4
#> 3 distinct 21.7 21.3 23 60.2 309.
#> 4 fill 28.6 40.7 43.6 83 724
#> 5 filter 135 136. 146. 179. 904.
#> 6 inner_join 29.8 29.8 36.5 67.9 NA
#> 7 left_join 47.5 46.8 56.9 75.4 NA
#> 8 mutate 45.5 50.7 226. 56.9 780.
#> 9 nest 16.4 17.8 72.5 50.6 NA
#> 10 pivot_longer 47.5 27 47.7 189 NA
#> 11 pivot_wider 48.5 59.4 136. 105. NA
#> 12 summarize 387 359 354. 645. 3080.
#> 13 unnest 212. 58.6 NA 67.8 NA