Speed Comparisons

A few notes:

  • Comparing times from separate functions won’t be very useful. For example - the summarize() tests were performed on a different dataset from case_when().
  • setDTthreads(4) was used for data.table & tidytable timings.
  • Modify-by-reference was used in data.table when being compared to mutate.() & dplyr::mutate()
  • fill.() & tidyr::fill() both work with character/factor/logical columns, whereas data.table::nafill() does not. Testing only included numeric columns due to this constraint.
  • dtplyr is missing timings for functions that are not yet implemented in the package.
  • pandas comparisons are in the process of being added - more will be added soon.
  • All tests are run 11 times. The times shown are the median of those 11 runs.
  • All timings are in milliseconds.
  • All tests can be found in the source code here.
  • FAQ - Why are some tidytable functions faster than their data.table counterpart?
    • Short answer - they’re not! After all they’re just using data.table in the background.
    • Long answer - All R functions have some slight natural variation in their execution time. By chance the tidytable runs were slightly shorter on those specific functions on this iteration of the tests. However one goal of these tests is to show that the “time cost” of translating tidyverse syntax to data.table is very negligible to the user (especially on medium-to-large datasets).
  • Lastly I’d like to mention that these tests were not rigorously created to cover all angles equally. They are just meant to be used as general insight into the performance of these packages.
#> Date last run: 2022-08-01
#> # A tidytable: 13 × 6
#>    func_tested  data.table tidytable dtplyr tidyverse pandas
#>    <chr>             <dbl>     <dbl>  <dbl>     <dbl>  <dbl>
#>  1 arrange            48.4      29.6   49.1    1644.   716. 
#>  2 case_when           8.5      12.1   NA        90.5   64.4
#>  3 distinct           21.7      21.3   23        60.2  309. 
#>  4 fill               28.6      40.7   43.6      83    724  
#>  5 filter            135       136.   146.      179.   904. 
#>  6 inner_join         29.8      29.8   36.5      67.9   NA  
#>  7 left_join          47.5      46.8   56.9      75.4   NA  
#>  8 mutate             45.5      50.7  226.       56.9  780. 
#>  9 nest               16.4      17.8   72.5      50.6   NA  
#> 10 pivot_longer       47.5      27     47.7     189     NA  
#> 11 pivot_wider        48.5      59.4  136.      105.    NA  
#> 12 summarize         387       359    354.      645.  3080. 
#> 13 unnest            212.       58.6   NA        67.8   NA