簡體   English   中英

R:在dataframe的另一列中按特定模式添加新列

[英]R: Add new column by specific patterns in another column of the dataframe

我的 dataframe A 看起來像這樣:

**Group**    **Pattern**
One         Black & White 
Two         Black OR Pink
Three           Red
Four            Pink
Five        White & Green
Six         Green & Orange
Seven           Orange
Eight        Pink & Red
Nine        Black OR White
Ten             Green
.                 .
.                 .
.                 .

然后我有 dataframe B 看起來像這樣:

**Color**    **Value**
Orange         12
Pink            2
Red             4
Green          22
Black          84
White         100

我想在 dataframe A 中基於其 Pattern 列添加一個名為 Value 的新列。 我希望它的方式是,如果有任何 (&),則將值相加(例如,如果它是黑白,我希望它變成 184),如果有任何 (OR),我想要擁有更高的數字(在同一個例子中,它將是 100)。

我可以使用 dplyr inner_join 加入它們,但是排除了帶有 &/OR 的行。還有其他方法嗎?

干杯!

dfA <- data.frame(group=seq(1,4), pattern=c("Black & White", "Black OR Pink", "Red", "Pink"), stringsAsFactors=F)
dfB <- data.frame(color=c("Pink", "Red", "Black", "White"), value=c(2,4,84,100), stringsAsFactors=F)
    
getVal2return <- function(i, dfA, dfB){
  
  andv <- unlist(strsplit(dfA$pattern[i], split=" & "))
  orv <- unlist(strsplit(dfA$pattern[i], split=" OR "))
  if (length(andv) > 1) {
    val <- sum(dfB$value[match(andv, dfB$color)])
  } else if (length(orv)> 1){
    val <- max(dfB$value[match(orv, dfB$color)])
  } else {
  val <- dfB$value[match(dfA$pattern[i], dfB$color)]
  }
  return(val)
}
    
dfA$newVal <- sapply(1:nrow(dfA), function(x) { getVal2return(x, dfA, dfB) })

> dfA
      group       pattern newVal
    1     1 Black & White    184
    2     2 Black OR Pink     84
    3     3           Red      4
    4     4          Pink      2

這是一種相當行之有效的方法:

A$Value <- A$Pattern
for(i in seq(nrow(B))) A$Value <- gsub(B$Color[i], B$Value[i], A$Value)
A$Value <- sub("&", "+", A$Value)
A$Value <- sub("^(\\d+) OR (\\d+)$", "max(\\1, \\2)", A$Value)
A$Value <- vapply(A$Value, function(x) eval(parse(text = x)), numeric(1))
A
#>    Group        Pattern Value
#> 1    One  Black & White   184
#> 2    Two  Black OR Pink    84
#> 3  Three            Red     4
#> 4   Four           Pink     2
#> 5   Five  White & Green   122
#> 6    Six Green & Orange    34
#> 7  Seven         Orange    12
#> 8  Eight     Pink & Red     6
#> 9   Nine Black OR White   100
#> 10   Ten          Green    22

reprex package (v2.0.1) 創建於 2022-02-18


數據

A <- structure(list(Group = c("One", "Two", "Three", "Four", "Five", 
"Six", "Seven", "Eight", "Nine", "Ten"), Pattern = c("Black & White", 
"Black OR Pink", "Red", "Pink", "White & Green", "Green & Orange", 
"Orange", "Pink & Red", "Black OR White", "Green")), class = "data.frame", 
row.names = c(NA, -10L))

B <- structure(list(Color = c("Orange", "Pink", "Red", "Green", "Black", 
"White"), Value = c(12L, 2L, 4L, 22L, 84L, 100L)), class = "data.frame", 
row.names = c(NA, -6L))

我會嘗試這樣的事情,我更喜歡 R 假設 df2 作為第二個 dataframe

df['Value'] = apply(df['Pattern'], 1, function(Pattern){
  s = strsplit(Pattern, ' & ')[[1]]
  if (length(s) == 2) {
     return(with(df2, Value[Color == s[1]] + Value[Color == s[2]]))
  }
  s = strsplit(Pattern, ' OR ')[[1]]
  if (length(s) == 2) {
     return(with(df2, max(Value[Color == s[1]], Value[Color == s[2]])))
  }
  return(df2[df2$Color == Pattern,]$Value)
})

df
#>    Group        Pattern Value
#> 1    One  Black & White   184
#> 2    Two  Black OR Pink    84
#> 3  Three            Red     4
#> 4   Four           Pink     2
#> 5   Five  White & Green   122
#> 6    Six Green & Orange    34
#> 7  Seven         Orange    12
#> 8  Eight     Pink & Red     6
#> 9   Nine Black OR White   100
#> 10   Ten          Green    22

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM