Speed Comparisons

A few notes:

  • Comparing times from separate functions won’t be very useful. For example - the summarize() tests were performed on a different dataset from case_when().
  • setDTthreads(4) was used for data.table & tidytable timings.
  • Modify-by-reference was used in data.table when being compared to mutate.() & dplyr::mutate()
  • fill.() & tidyr::fill() both work with character/factor/logical columns, whereas data.table::nafill() does not. Testing only included numeric columns due to this constraint.
  • dtplyr is missing timings for functions that are not yet implemented in the package.
  • pandas comparisons are in the process of being added - more will be added soon.
  • All tests are run 11 times. The times shown are the median of those 11 runs.
  • All timings are in milliseconds.
  • All tests can be found in the source code here.
  • FAQ - Why are some tidytable functions faster than their data.table counterpart?
    • Short answer - they’re not! After all they’re just using data.table in the background.
    • Long answer - All R functions have some slight natural variation in their execution time. By chance the tidytable runs were slightly shorter on those specific functions on this iteration of the tests. However one goal of these tests is to show that the “time cost” of translating tidyverse syntax to data.table is very negligible to the user (especially on medium-to-large datasets).
  • Lastly I’d like to mention that these tests were not rigorously created to cover all angles equally. They are just meant to be used as general insight into the performance of these packages.
#> Date last run: 2023-04-20
#> # A tidytable: 13 × 6
#>    func_tested  data.table tidytable dtplyr tidyverse pandas
#>    <chr>             <dbl>     <dbl>  <dbl>     <dbl>  <dbl>
#>  1 arrange            61.4      33.5   71.1      79.1  716. 
#>  2 case_when          10.2       9.9   NA        53.6   64.4
#>  3 distinct           29.5      30.7   33.4      68.4  309. 
#>  4 fill               40        74     58.3      37.8  724  
#>  5 filter            189.      190.   205.      218.   904. 
#>  6 inner_join         39.7      40     49.1     102.    NA  
#>  7 left_join          68.1      67.2   79.9     196.    NA  
#>  8 mutate             61.8      67.3  133.       63.3  780. 
#>  9 nest               19.6      20.3   95.8      56.8   NA  
#> 10 pivot_longer       61.3      37.7   70.1     213.    NA  
#> 11 pivot_wider        61.3      73.9  184.      141.    NA  
#> 12 summarize         433.      390.   397.      802.  3080. 
#> 13 unnest            264.       75.4   NA        92.7   NA