Below are some speed comparisons of various functions. More functions will get added to the speed comps over time.
A few notes:
summarize() tests were performed on a different dataset from case_when().setDTthreads(4) was used for data.table & tidytable timings.data.table when being compared to mutate.() & dplyr::mutate()
fill.() & tidyr::fill() both work with character/factor/logical columns, whereas data.table::nafill() does not. Testing only included numeric columns due to this constraint.dtplyr is missing timings for functions that are not yet implemented in the package.pandas comparisons are in the process of being added - more will be added soon.tidytable functions faster than their data.table counterpart?
tidytable function appears to be “faster” than data.table it’s due to this. However one goal of these tests is to show that the “time cost” of translating tidyverse syntax to data.table is negligible to the user.#> # tidytable [13 × 7]
#> func_tested data.table tidytable dtplyr tidyverse pandas tidytable_vs_dplyr
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 arrange 43.8 46.6 44.5 1351. 355 3.4%
#> 2 case_when 26.3 25.3 NA 335. 59.2 7.6%
#> 3 distinct 18.5 19 20 53.5 309 35.5%
#> 4 fill 28.6 44.8 NA 66.7 846 67.2%
#> 5 filter 228. 226. 238 261. 707 86.9%
#> 6 inner_join 44.3 47.3 116 84.1 NA 56.2%
#> 7 left_join 70.9 51.2 158. 87 NA 58.9%
#> 8 mutate 69 56.3 538. 77.8 86.4 72.4%
#> 9 nest 9.6 14.5 NA 30.7 NA 47.2%
#> 10 pivot_longer 11.1 13.6 NA 47.7 NA 28.5%
#> 11 pivot_wider 94.8 99.7 NA 78.8 NA 126.5%
#> 12 summarize 176. 177. 178. 236. 834 75.2%
#> 13 unnest 14.7 7.7 NA 30.4 NA 25.3%