简体   繁体   English

R:使用另一个数据框中的列名,条件和值在一个数据框中创建一个新列

[英]R: Create a new column in a dataframe, using column name, condition and value from another dataframe

Consider a base data frame as: 将基本数据帧视为:

data <-  data.frame(amount_bin = c("10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+", "10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+", "10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+"),
                   risk_score = c("0-700", "700-750", "750-800", "800-850", "850-900", "0-700", "700-750", "750-800", "800-850", "850-900", "0-700", "700-750", "750-800", "800-850", "850-900"))

and grouping information in another data frame as: 并在另一个数据帧中将信息分组为:

group_info <- data.frame(variable = c("amount_bin_group", "amount_bin_group", "amount_bin_group", "amount_bin_group", "amount_bin_group",
                                 "risk_score_group", "risk_score_group", "risk_score_group", "risk_score_group", "risk_score_group"),
                    bin = c("10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+",
                            "0-700", "700-750", "750-800", "800-850", "850-900"),
                    group = c("1", "1", "2", "2", "3",
                              "a", "a", "a", "b", "b"))

I want to make 2 columns in base data frame (data) called "amount_bin_group" and "risk_score_group", which takes values from the column group_info$group when bin columns from group_info and data are the same. 我想在称为“ amount_bin_group”和“ risk_score_group”的基本数据帧(数据)中创建2列,当来自group_info和数据的bin列相同时,它们将从列group_info $ group中获取值。 For simplicity, let's assume that the base column will always be the group_info$variable name minus the "group" string. 为简单起见,我们假设基本列始终是group_info $ variable名称减去“ group”字符串。 That implies, when we want to create column amount_bin_group, base column will always be amount_bin in the base data frame. 这意味着,当我们要创建列amount_bin_group时,基本列在基本数据帧中将始终为amount_bin。

The expected result data frame is: 预期结果数据帧为:

final_data <-  data.frame(amount_bin = c("10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+", "10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+", "10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+"),
                   risk_score = c("0-700", "700-750", "750-800", "800-850", "850-900", "0-700", "700-750", "750-800", "800-850", "850-900", "0-700", "700-750", "750-800", "800-850", "850-900"),
                   amount_bin_group = c("1", "1", "2", "2", "3", "1", "1", "2", "2", "3", "1", "1", "2", "2", "3"),
                   risk_score_group = c("a", "a", "a", "b", "b", "a", "a", "a", "b", "b", "a", "a", "a", "b", "b"))

A solution that I just thought is iteratively merge the data frames ie : 我刚刚想到的解决方案是迭代合并数据帧,即:

final_data <- merge(data, group_info[, c("bin", "group")], by.x = "amount_bin", by.y = "bin")

final_data$amount_bin_group <- final_data$group
final_data$group <- NULL

But, I am sure there can be a more efficient solution. 但是,我相信可以有一个更有效的解决方案。 Please note that there are multiple such columns and not just two. 请注意,有多个此类列,而不仅仅是两个。 So, maybe a loop would help. 因此,也许循环会有所帮助。

Your group_info is just way over-tidy. 您的group_info太整洁了。 I can't believe I'm actually saying that. 我真不敢说我在说。 By breaking that into either two dataframes, or breaking each half into it's own column, you enable yourself to do a simple left join to get the answer. 通过将其分为两个数据框,或将每个半框分成自己的列,您可以自己进行简单的左连接以获取答案。

final_data_calc <- data %>%
  left_join(
    group_info %>% 
      filter(variable == 'amount_bin_group') %>% 
      rename(amount_bin_group = group,amount_bin = bin) %>% 
      select(-variable)
  ) %>%
  left_join(
    group_info %>% 
      filter(variable == 'risk_score_group') %>% 
      rename(risk_score_group = group,risk_score = bin) %>% 
      select(-variable)
  )

#   amount_bin risk_score amount_bin_group risk_score_group
#1     10K-25K      0-700                1                a
#2     25K-35K    700-750                1                a
#3     35K-45K    750-800                2                a
#4     45K-50K    800-850                2                b
#5        50K+    850-900                3                b
#6     10K-25K      0-700                1                a
#7     25K-35K    700-750                1                a
#8     35K-45K    750-800                2                a
#9     45K-50K    800-850                2                b
#10       50K+    850-900                3                b
#11    10K-25K      0-700                1                a
#12    25K-35K    700-750                1                a
#13    35K-45K    750-800                2                a
#14    45K-50K    800-850                2                b
#15       50K+    850-900                3                b

You could just use a for loop to keep merging on the different sets: 您可以只使用for循环来继续合并不同的集合:

for (i in unique(group_info$variable)) {
  data <- merge(
    data, group_info[group_info$variable==i,c("bin","group")],
    by.x=sub("_group","",i), by.y="bin"
  )
  names(data)[names(data)=="group"] <- i
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R 在 dataframe 中创建列 dataframe 的值名称 - R create column in dataframe value name of dataframe R-如何根据另一列的条件通过计算在数据框中创建新列 - R - How to create a new column in a dataframe with calculations based on condition of another column R:使用另一个数据框创建一个新列 - R: Creating a new column using another dataframe 在 R 中使用 if {} else {} 在数据框中创建新列 - Create new column in dataframe using if {} else {} in R 根据 R 中 dataframe 的另一列的相等值,在新列(在第一个数据帧中)中添加值(来自第二个数据帧) - Add value (from 2nd dataframe) in new column (in 1st dataframe) based on equality value of another column from both dataframe in R 根据 R 中的列值,基于现有 dataframe 创建另一个 dataframe - Create another dataframe based on an existing dataframe based on a column value in R 根据与另一个数据框的值匹配在数据框上创建新列 - Create a new column on a dataframe based on value match with another dataframe R - 使用另一个数据框的匹配值向数据框添加新列 - R - Add a new column to a dataframe using matching values of another dataframe 基于另一个在 dataframe 中创建新列,并与 R 中的另一个数据集匹配 - Create new column in dataframe based on another and matching to another dataset in R 使用 R dataframe 中的值来规范化另一个 dataframe 的列 - use a value from an R dataframe to normalise a column of another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM