[英]R: Add new column by specific patterns in another column of the dataframe
我的 dataframe A 看起來像這樣:
**Group** **Pattern**
One Black & White
Two Black OR Pink
Three Red
Four Pink
Five White & Green
Six Green & Orange
Seven Orange
Eight Pink & Red
Nine Black OR White
Ten Green
. .
. .
. .
然后我有 dataframe B 看起來像這樣:
**Color** **Value**
Orange 12
Pink 2
Red 4
Green 22
Black 84
White 100
我想在 dataframe A 中基於其 Pattern 列添加一個名為 Value 的新列。 我希望它的方式是,如果有任何 (&),則將值相加(例如,如果它是黑白,我希望它變成 184),如果有任何 (OR),我想要擁有更高的數字(在同一個例子中,它將是 100)。
我可以使用 dplyr inner_join 加入它們,但是排除了帶有 &/OR 的行。還有其他方法嗎?
干杯!
dfA <- data.frame(group=seq(1,4), pattern=c("Black & White", "Black OR Pink", "Red", "Pink"), stringsAsFactors=F)
dfB <- data.frame(color=c("Pink", "Red", "Black", "White"), value=c(2,4,84,100), stringsAsFactors=F)
getVal2return <- function(i, dfA, dfB){
andv <- unlist(strsplit(dfA$pattern[i], split=" & "))
orv <- unlist(strsplit(dfA$pattern[i], split=" OR "))
if (length(andv) > 1) {
val <- sum(dfB$value[match(andv, dfB$color)])
} else if (length(orv)> 1){
val <- max(dfB$value[match(orv, dfB$color)])
} else {
val <- dfB$value[match(dfA$pattern[i], dfB$color)]
}
return(val)
}
dfA$newVal <- sapply(1:nrow(dfA), function(x) { getVal2return(x, dfA, dfB) })
> dfA
group pattern newVal
1 1 Black & White 184
2 2 Black OR Pink 84
3 3 Red 4
4 4 Pink 2
這是一種相當行之有效的方法:
A$Value <- A$Pattern
for(i in seq(nrow(B))) A$Value <- gsub(B$Color[i], B$Value[i], A$Value)
A$Value <- sub("&", "+", A$Value)
A$Value <- sub("^(\\d+) OR (\\d+)$", "max(\\1, \\2)", A$Value)
A$Value <- vapply(A$Value, function(x) eval(parse(text = x)), numeric(1))
A
#> Group Pattern Value
#> 1 One Black & White 184
#> 2 Two Black OR Pink 84
#> 3 Three Red 4
#> 4 Four Pink 2
#> 5 Five White & Green 122
#> 6 Six Green & Orange 34
#> 7 Seven Orange 12
#> 8 Eight Pink & Red 6
#> 9 Nine Black OR White 100
#> 10 Ten Green 22
由reprex package (v2.0.1) 創建於 2022-02-18
數據
A <- structure(list(Group = c("One", "Two", "Three", "Four", "Five",
"Six", "Seven", "Eight", "Nine", "Ten"), Pattern = c("Black & White",
"Black OR Pink", "Red", "Pink", "White & Green", "Green & Orange",
"Orange", "Pink & Red", "Black OR White", "Green")), class = "data.frame",
row.names = c(NA, -10L))
B <- structure(list(Color = c("Orange", "Pink", "Red", "Green", "Black",
"White"), Value = c(12L, 2L, 4L, 22L, 84L, 100L)), class = "data.frame",
row.names = c(NA, -6L))
我會嘗試這樣的事情,我更喜歡 R 假設 df2 作為第二個 dataframe
df['Value'] = apply(df['Pattern'], 1, function(Pattern){
s = strsplit(Pattern, ' & ')[[1]]
if (length(s) == 2) {
return(with(df2, Value[Color == s[1]] + Value[Color == s[2]]))
}
s = strsplit(Pattern, ' OR ')[[1]]
if (length(s) == 2) {
return(with(df2, max(Value[Color == s[1]], Value[Color == s[2]])))
}
return(df2[df2$Color == Pattern,]$Value)
})
df
#> Group Pattern Value
#> 1 One Black & White 184
#> 2 Two Black OR Pink 84
#> 3 Three Red 4
#> 4 Four Pink 2
#> 5 Five White & Green 122
#> 6 Six Green & Orange 34
#> 7 Seven Orange 12
#> 8 Eight Pink & Red 6
#> 9 Nine Black OR White 100
#> 10 Ten Green 22
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.