[英]Is the separate function work in arrow tables in R?
I am wondering is there any solution to utilize the separate
function for arrow
tables?我想知道是否有任何解决方案可以将separate
的 function 用于arrow
表? The column data organizing should perform these type of data manipulation faster than for data.frame
列数据组织应该比data.frame
更快地执行这些类型的数据操作
separate
itself is not supported, but sometimes we can use sub
and supported functions to get what we need. separate
本身是不被支持的,但有时我们可以使用sub
和支持的函数来获得我们需要的东西。 For example,例如,
library(dplyr)
library(arrow) # 10.0.0
# from ?tidyr::separate
df <- data.frame(x = c(NA, "x.y", "x.z", "y.z"))
write_parquet(df, "quux.parquet")
ds <- open_dataset("quux.parquet")
ds %>%
tidyr::separate(x, c("A", "B"))
# Error in UseMethod("separate") :
# no applicable method for 'separate' applied to an object of class "c('FileSystemDataset', 'Dataset', 'ArrowObject', 'R6')"
df %>%
tidyr::separate(x, c("A", "B"))
# A B
# 1 <NA> <NA>
# 2 x y
# 3 x z
# 4 y z
Similar, using sub
and family:类似的,使用sub
和 family:
df %>%
mutate(A = sub("\\..*", "", x), B = sub(".*\\.", "", x))
# x A B
# 1 <NA> <NA> <NA>
# 2 x.y x y
# 3 x.z x z
# 4 y.z y z
ds %>%
mutate(A = sub("\\..*", "", x), B = sub(".*\\.", "", x))
# FileSystemDataset (query)
# x: string
# A: string (replace_substring_regex(x, {pattern="\..*", replacement="", max_replacements=1}))
# B: string (replace_substring_regex(x, {pattern=".*\.", replacement="", max_replacements=1}))
# See $.data for the source Arrow object
ds %>%
mutate(A = sub("\\..*", "", x), B = sub(".*\\.", "", x)) %>%
collect()
# x A B
# 1 <NA> <NA> <NA>
# 2 x.y x y
# 3 x.z x z
# 4 y.z y z
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.