A few notes:
summarize()
tests were performed on a
different dataset from case_when()
.setDTthreads(4)
was used for data.table
& tidytable
timings.data.table
when being
compared to mutate.()
&
dplyr::mutate()
fill.()
& tidyr::fill()
both work with
character/factor/logical columns, whereas
data.table::nafill()
does not. Testing only included
numeric columns due to this constraint.dtplyr
is missing timings for functions that are not
yet implemented in the package.pandas
comparisons are in the process of being added -
more will be added soon.tidytable
functions faster than
their data.table
counterpart?
data.table
in the background.tidytable
runs were
slightly shorter on those specific functions on this iteration of the
tests. However one goal of these tests is to show that the “time cost”
of translating tidyverse
syntax to data.table
is very negligible to the user (especially on medium-to-large
datasets).#> Date last run: 2024-12-10
#> # A tidytable: 13 × 6
#> func_tested data.table tidytable dtplyr tidyverse pandas
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 arrange 17.6 11.6 19.3 23 716.
#> 2 case_when 7.5 6.5 NA 23.5 64.4
#> 3 distinct 14.4 16.4 15.6 15.9 309.
#> 4 fill 15.6 20.7 17.6 8.1 724
#> 5 filter 69.3 69 73.9 96.3 904.
#> 6 inner_join 20.6 20.2 23.3 44.7 NA
#> 7 left_join 27.8 26.1 31.7 85.3 NA
#> 8 mutate 38.4 38.5 58.1 37.7 780.
#> 9 nest 8.6 6.8 29.8 18.4 NA
#> 10 pivot_longer 28.7 17.3 26.4 87.6 NA
#> 11 pivot_wider 14.2 15.5 61.9 47.6 NA
#> 12 summarize 196. 189. 192. 394. 3080.
#> 13 unnest 106. 30.8 NA 39.6 NA