简体   繁体   English

单独的 function 是否在 R 的箭头表中工作?

[英]Is the separate function work in arrow tables in R?

I am wondering is there any solution to utilize the separate function for arrow tables?我想知道是否有任何解决方案可以将separate的 function 用于arrow表? The column data organizing should perform these type of data manipulation faster than for data.frame列数据组织应该比data.frame更快地执行这些类型的数据操作

separate itself is not supported, but sometimes we can use sub and supported functions to get what we need. separate本身是不被支持的,但有时我们可以使用sub和支持的函数来获得我们需要的东西。 For example,例如,

library(dplyr)
library(arrow) # 10.0.0
# from ?tidyr::separate
df <- data.frame(x = c(NA, "x.y", "x.z", "y.z"))
write_parquet(df, "quux.parquet")
ds <- open_dataset("quux.parquet")
ds %>%
  tidyr::separate(x, c("A", "B"))
# Error in UseMethod("separate") : 
#   no applicable method for 'separate' applied to an object of class "c('FileSystemDataset', 'Dataset', 'ArrowObject', 'R6')"
df %>%
  tidyr::separate(x, c("A", "B"))
#      A    B
# 1 <NA> <NA>
# 2    x    y
# 3    x    z
# 4    y    z

Similar, using sub and family:类似的,使用sub和 family:

df %>%
  mutate(A = sub("\\..*", "", x), B = sub(".*\\.", "", x))
#      x    A    B
# 1 <NA> <NA> <NA>
# 2  x.y    x    y
# 3  x.z    x    z
# 4  y.z    y    z
ds %>%
  mutate(A = sub("\\..*", "", x), B = sub(".*\\.", "", x))
# FileSystemDataset (query)
# x: string
# A: string (replace_substring_regex(x, {pattern="\..*", replacement="", max_replacements=1}))
# B: string (replace_substring_regex(x, {pattern=".*\.", replacement="", max_replacements=1}))
# See $.data for the source Arrow object
ds %>%
  mutate(A = sub("\\..*", "", x), B = sub(".*\\.", "", x)) %>%
  collect()
#      x    A    B
# 1 <NA> <NA> <NA>
# 2  x.y    x    y
# 3  x.z    x    z
# 4  y.z    y    z

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM