[英]Transforming columns of unequal arrays to column of single values in R
作为上一个问题之后的下一步,假设 arrays 的多列长度不同。 例如:
可乐 | Col_B | Col_C |
---|---|---|
[0.1,0.5,0.7] | [1.54E12、1.54E12、1.54E12] | [1, 3, 4, 5} |
我怎样才能采用这种格式并将其重新格式化为以下,在适当的情况下为 Col_A 和 Col_b 提供 NA:
可乐 | Col_B | Col_C |
---|---|---|
0.1 | 1.54E12 | 1 |
0.5 | 1.54E12 | 3 |
0.7 | 1.54E12 | 4 |
不适用 | 不适用 | 5 |
此代码适用于所有 arrays 相等但如果 arrays 不相等时将引发错误:
library(dplyr)
library(stringr)
library(tidyr)
df %>%
mutate(across(everything(), str_extract_all, "(?<=\\[)[^]]+")) %>%
unnest(c(NDVIs, dates)) %>%
separate_rows(c(NDVIs, dates), sep=",\\s+", convert = TRUE)
这里没有足够的tidyverse
经验,所以这是我的解决方案data.table
。 我在步骤和结果之间包含了所有内容,以显示正在发生的事情......
library( data.table )
#create sample data
DT <- fread("Col_A Col_B Col_C
[0.1,0.5,0.7] [1.54E12, 1.54E12, 1.54E12] [1, 3, 4, 5]")
# Col_A Col_B Col_C
# 1: [0.1,0.5,0.7] [1.54E12, 1.54E12, 1.54E12] [1, 3, 4, 5]
#melt to long format
ans <- melt( DT, measure.vars = names(DT), variable.factor = FALSE )
# variable value
# 1: Col_A [0.1,0.5,0.7]
# 2: Col_B [1.54E12, 1.54E12, 1.54E12]
# 3: Col_C [1, 3, 4, 5]
#remove [] and split the value column using ', ' as sepatator
ans[, value := gsub( "\\[|\\]", "", value ) ]
ans[, paste0( "v", 1:length( tstrsplit(ans$value, "," ) ) ) :=
lapply( tstrsplit(value, "," ), as.numeric ) ][]
# variable value v1 v2 v3 v4
# 1: Col_A 0.1,0.5,0.7 1.00e-01 5.00e-01 7.00e-01 NA
# 2: Col_B 1.54E12, 1.54E12, 1.54E12 1.54e+12 1.54e+12 1.54e+12 NA
# 3: Col_C 1, 3, 4, 5 1.00e+00 3.00e+00 4.00e+00 5
#transpose (without value-columns) to get wide format again
transpose( ans[, -"value"], make.names = "variable" )
# Col_A Col_B Col_C
# 1: 0.1 1.54e+12 1
# 2: 0.5 1.54e+12 3
# 3: 0.7 1.54e+12 4
# 4: NA NA 5
我们可以使用cSplit
中的splitstackshape
library(splitstackshape)
library(data.table)
cSplit(setDT(df)[, lapply(.SD, gsub, pattern = "[][}]",
replacement = "")], names(df), sep=",", fixed = FALSE, "long")
# Col_A Col_B Col_C
#1: 0.1 1.54e+12 1
#2: 0.5 1.54e+12 3
#3: 0.7 1.54e+12 4
#4: NA NA 5
df <- structure(list(Col_A = "[0.1,0.5,0.7]", Col_B = "[1.54E12, 1.54E12, 1.54E12]",
Col_C = "[1, 3, 4, 5}"), class = "data.frame", row.names = c(NA,
-1L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.