[英]R: Convert delimited string into variables
我有一個數據框,其中一列包含以空格分隔的字符代碼列表:
"Ab B C"
""
"X C"
"N Ab F S"
:
我想將其轉換為多個列,每個列對應一個不同的值,指示(在1或0中)該值在列表中找到。 以上示例給出了期望的結果:
df$Ab = 1,0,0,1
df$B = 1,0,0,0
df$C = 1,0,1,0
df$F = 0,0,0,1
df$N = 0,0,0,1
做這個的最好方式是什么?
假設你開始:
df <- data.frame(v1 = c("Ab B C", "", "X C", "N Ab F S"))
您可以從我的“splitstackshape”包中嘗試cSplit_e
:
library(splitstackshape)
cSplit_e(df, "v1", sep = " ", type = "character", fill = 0)
# v1 v1_Ab v1_B v1_C v1_F v1_N v1_S v1_X
# 1 Ab B C 1 1 1 0 0 0 0
# 2 0 0 0 0 0 0 0
# 3 X C 0 0 1 0 0 0 1
# 4 N Ab F S 1 0 0 1 1 1 0
你可以試試
library(qdapTools)
lst <- strsplit(df1$Col1, ' ')
cbind(df1, mtabulate(lst))
# Col1 Ab B C F N S X
#1 Ab B C 1 1 1 0 0 0 0
#2 0 0 0 0 0 0 0
#3 X C 0 0 1 0 0 0 1
#4 N Ab F S 1 0 0 1 1 1 0
或使用base R
lvls <- sort(unique(unlist(lst)))
cbind(df1, t(vapply(lst, function(x) table(factor(x, levels=lvls)),
numeric(length(lvls)))))
df1 <- structure(list(Col1 = c("Ab B C", "", "X C", "N Ab F S")),
.Names = "Col1", row.names = c(NA, -4L), class = "data.frame")
在基地R,另一種方法:
lst = strsplit(df$Col1, ' ')
cols = unique(unlist(lst))
m = do.call(rbind, lapply(lst, function(u) cols %in% u +0))
colnames(m) = cols
#> m
# Ab B C X N F S
#[1,] 1 1 1 0 0 0 0
#[2,] 0 0 0 0 0 0 0
#[3,] 0 0 1 1 0 0 0
#[4,] 1 0 0 0 1 1 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.