Extract a character column into multiple columns using regex

Superseded

extract() has been superseded by separate_wider_regex().

Given a regular expression with capturing groups, extract() turns each group into a new column. If the groups don't match, or the input is NA, the output will be NA. When you pass same name in the into argument it will merge the groups together. Whilst passing NA in the into arg will drop the group from the resulting tidytable

extract(
  .df,
  col,
  into,
  regex = "([[:alnum:]]+)",
  remove = TRUE,
  convert = FALSE,
  ...
)

Arguments

.df: A data.table or data.frame
col: Column to extract from
into: New column names to split into. A character vector.
regex: A regular expression to extract the desired values. There should be one group (defined by ()) for each element of into
remove: If TRUE, remove the input column from the output data.table
convert: If TRUE, runs type.convert() on the resulting column. Useful if the resulting column should be type integer/double.
...: Additional arguments passed on to methods.

Examples

df <- data.table(x = c(NA, "a-b-1", "a-d-3", "b-c-2", "d-e-7"))
df %>% extract(x, "A")
#> # A tidytable: 5 × 1
#>   A    
#>   <chr>
#> 1 NA   
#> 2 a    
#> 3 a    
#> 4 b    
#> 5 d    
df %>% extract(x, c("A", "B"), "([[:alnum:]]+)-([[:alnum:]]+)")
#> # A tidytable: 5 × 2
#>   A     B    
#>   <chr> <chr>
#> 1 NA    NA   
#> 2 a     b    
#> 3 a     d    
#> 4 b     c    
#> 5 d     e    

# If no match, NA:
df %>% extract(x, c("A", "B"), "([a-d]+)-([a-d]+)")
#> # A tidytable: 5 × 2
#>   A     B    
#>   <chr> <chr>
#> 1 NA    NA   
#> 2 a     b    
#> 3 a     d    
#> 4 b     c    
#> 5 NA    NA   
# drop columns by passing NA
df %>% extract(x, c("A", NA, "B"), "([a-d]+)-([a-d]+)-(\\d+)")
#> # A tidytable: 5 × 2
#>   A     B    
#>   <chr> <chr>
#> 1 NA    NA   
#> 2 a     1    
#> 3 a     3    
#> 4 b     2    
#> 5 NA    NA   
# merge groups by passing same name
df %>% extract(x, c("A", "B", "A"), "([a-d]+)-([a-d]+)-(\\d+)")
#> # A tidytable: 5 × 2
#>   A     B    
#>   <chr> <chr>
#> 1 NANA  NA   
#> 2 a1    b    
#> 3 a3    d    
#> 4 b2    c    
#> 5 NANA  NA