Is the separate function work in arrow tables in R?

Question

I am wondering is there any solution to utilize the separate function for arrow tables? The column data organizing should perform these type of data manipulation faster than for data.frame

Answer 1

separate itself is not supported, but sometimes we can use sub and supported functions to get what we need. For example,

library(dplyr)
library(arrow) # 10.0.0
# from ?tidyr::separate
df <- data.frame(x = c(NA, "x.y", "x.z", "y.z"))
write_parquet(df, "quux.parquet")
ds <- open_dataset("quux.parquet")
ds %>%
  tidyr::separate(x, c("A", "B"))
# Error in UseMethod("separate") : 
#   no applicable method for 'separate' applied to an object of class "c('FileSystemDataset', 'Dataset', 'ArrowObject', 'R6')"
df %>%
  tidyr::separate(x, c("A", "B"))
#      A    B
# 1 <NA> <NA>
# 2    x    y
# 3    x    z
# 4    y    z

Similar, using sub and family:

df %>%
  mutate(A = sub("\\..*", "", x), B = sub(".*\\.", "", x))
#      x    A    B
# 1 <NA> <NA> <NA>
# 2  x.y    x    y
# 3  x.z    x    z
# 4  y.z    y    z
ds %>%
  mutate(A = sub("\\..*", "", x), B = sub(".*\\.", "", x))
# FileSystemDataset (query)
# x: string
# A: string (replace_substring_regex(x, {pattern="\..*", replacement="", max_replacements=1}))
# B: string (replace_substring_regex(x, {pattern=".*\.", replacement="", max_replacements=1}))
# See $.data for the source Arrow object
ds %>%
  mutate(A = sub("\\..*", "", x), B = sub(".*\\.", "", x)) %>%
  collect()
#      x    A    B
# 1 <NA> <NA> <NA>
# 2  x.y    x    y
# 3  x.z    x    z
# 4  y.z    y    z

Is the separate function work in arrow tables in R?

Question

1 answers

solution1
2 2022-11-18 14:56:33

Is the separate function work in arrow tables in R?

Question

1 answers

solution1 2 2022-11-18 14:56:33

solution1
2 2022-11-18 14:56:33