繁体   English   中英

将不相等的 arrays 的列转换为 R 中的单个值的列

[英]Transforming columns of unequal arrays to column of single values in R

作为上一个问题之后的下一步,假设 arrays 的多列长度不同。 例如:

可乐 Col_B Col_C
[0.1,0.5,0.7] [1.54E12、1.54E12、1.54E12] [1, 3, 4, 5}

我怎样才能采用这种格式并将其重新格式化为以下,在适当的情况下为 Col_A 和 Col_b 提供 NA:

可乐 Col_B Col_C
0.1 1.54E12 1
0.5 1.54E12 3
0.7 1.54E12 4
不适用 不适用 5

此代码适用于所有 arrays 相等但如果 arrays 不相等时将引发错误:

library(dplyr)
library(stringr)
library(tidyr)
df  %>% 
   mutate(across(everything(), str_extract_all, "(?<=\\[)[^]]+")) %>% 
   unnest(c(NDVIs, dates)) %>% 
   separate_rows(c(NDVIs, dates), sep=",\\s+", convert = TRUE)

这里没有足够的tidyverse经验,所以这是我的解决方案data.table 我在步骤和结果之间包含了所有内容,以显示正在发生的事情......

library( data.table )
#create sample data
DT <- fread("Col_A  Col_B   Col_C
[0.1,0.5,0.7]   [1.54E12, 1.54E12, 1.54E12]     [1, 3, 4, 5]")
#            Col_A                       Col_B        Col_C
# 1: [0.1,0.5,0.7] [1.54E12, 1.54E12, 1.54E12] [1, 3, 4, 5]

#melt to long format
ans <- melt( DT, measure.vars = names(DT), variable.factor = FALSE )
#    variable                       value
# 1:    Col_A               [0.1,0.5,0.7]
# 2:    Col_B [1.54E12, 1.54E12, 1.54E12]
# 3:    Col_C                [1, 3, 4, 5]

#remove [] and split the value column using ', ' as sepatator
ans[, value := gsub( "\\[|\\]", "", value ) ]
ans[, paste0( "v", 1:length( tstrsplit(ans$value, "," ) ) ) := 
      lapply( tstrsplit(value, "," ), as.numeric ) ][]
#    variable                     value       v1       v2       v3 v4
# 1:    Col_A               0.1,0.5,0.7 1.00e-01 5.00e-01 7.00e-01 NA
# 2:    Col_B 1.54E12, 1.54E12, 1.54E12 1.54e+12 1.54e+12 1.54e+12 NA
# 3:    Col_C                1, 3, 4, 5 1.00e+00 3.00e+00 4.00e+00  5

#transpose (without value-columns) to get wide format again
transpose( ans[, -"value"], make.names = "variable" )
#    Col_A    Col_B Col_C
# 1:   0.1 1.54e+12     1
# 2:   0.5 1.54e+12     3
# 3:   0.7 1.54e+12     4
# 4:    NA       NA     5

我们可以使用cSplit中的splitstackshape

library(splitstackshape)
library(data.table)
cSplit(setDT(df)[, lapply(.SD, gsub, pattern = "[][}]", 
    replacement = "")], names(df), sep=",", fixed = FALSE, "long")
#   Col_A    Col_B Col_C
#1:   0.1 1.54e+12     1
#2:   0.5 1.54e+12     3
#3:   0.7 1.54e+12     4
#4:    NA       NA     5

数据

df <- structure(list(Col_A = "[0.1,0.5,0.7]", Col_B = "[1.54E12, 1.54E12, 1.54E12]", 
    Col_C = "[1, 3, 4, 5}"), class = "data.frame", row.names = c(NA, 
-1L))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM