Aggregate data using summary statistics — summarize • tidytable

Aggregate data using summary statistics such as mean or median. Can be calculated by group.

summarize(
  .df,
  ...,
  .by = NULL,
  .sort = TRUE,
  .groups = "drop_last",
  .unpack = FALSE
)

summarise(
  .df,
  ...,
  .by = NULL,
  .sort = TRUE,
  .groups = "drop_last",
  .unpack = FALSE
)

Arguments

.df

A data.frame or data.table

...

Aggregations to perform

.by

Columns to group by.

A single column can be passed with .by = d.
Multiple columns can be passed with .by = c(c, d)
tidyselect can be used:
- Single predicate: .by = where(is.character)
- Multiple predicates: .by = c(where(is.character), where(is.factor))
- A combination of predicates and column names: .by = c(where(is.character), b)

.sort

experimental: Default TRUE. If FALSE the original order of the grouping variables will be preserved.

.groups

Grouping structure of the result

"drop_last": Drop the last level of grouping
"drop": Drop all groups
"keep": Keep all groups

.unpack

experimental: Default FALSE. Should unnamed data frame inputs be unpacked. The user must opt in to this option as it can lead to a reduction in performance.

Examples

df <- data.table(
  a = 1:3,
  b = 4:6,
  c = c("a", "a", "b"),
  d = c("a", "a", "b")
)

df %>%
  summarize(avg_a = mean(a),
            max_b = max(b),
            .by = c)
#> # A tidytable: 2 × 3
#>   c     avg_a max_b
#>   <chr> <dbl> <int>
#> 1 a       1.5     5
#> 2 b       3       6

df %>%
  summarize(avg_a = mean(a),
            .by = c(c, d))
#> # A tidytable: 2 × 3
#>   c     d     avg_a
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3