简体   繁体   English

如何基于一列的部分与另一数据框中的值的匹配来填充R中的列

[英]How to fill columns in R based on matching parts of one column to values in another data frame

I have two data frames, one with my data ( data ) and one with a lookup table ( lookup ). 我有两个数据帧,一个带有我的数据( data ),一个带有查询表( lookup )。 The data includes a column called claims ; 数据包括一个称为claims的列; its cells are filled with one or more codes identifying the types of legal claims brought in a particular case (each row represents one case). 其单元格中填充着一个或多个代码,用于标识在特定案件中提出的法律要求的类型(每一行代表一个案件)。 Multiple types of claims are separated by a semicolon. 多种类型的索赔用分号分隔。

The lookup data frame has three columns: code , category , and so_category . lookup数据框具有三列: codecategoryso_category The code column lists each unique claim code used in the claims column of data . code列列出了data claims列中使用的每个唯一索赔代码。 category contains a category I assigned to that kind of claim, and so_category assigns a higher-level category into which that particular category fits. category包含我分配给该声明的类别,而so_category分配了适合该特定category的更高级别的类别。

What I'm trying to do is add columns to data for each category and so_category that will just be filled with 0 or 1 depending on whether there are claims in the case that correspond to each category and so_category . 我想要做的就是添加列data为每个categoryso_category将只是充满取决于是否有0或1 claims中对应于每个案件categoryso_category

Below is an example of what my data frames look like: 以下是我的数据框的示例:

data
Case      claims
1         wiretap;fdcpa
2         ca_ucl;comlaw
3         tort;comlaw;wiretap;ca_ucl
lookup
code     category     so_category
wiretap  f_wiretap    f_statute
fdcpa    f_con_prot   f_statute
ca_ucl   st_con_prot  st_statute
comlaw   com_law      common_law
tort     com_law      common_law

So what I would like to generate programmatically is something like: 所以我想以编程方式生成如下内容:

data
Case      claims                      f_stat   st_stat   common_law
1         wiretap;fdcpa               1        0         0
2         ca_ucl;comlaw               0        1         1
3         tort;comlaw;wiretap;ca_ucl  1        1         1

I'm quite new to R and am pretty much at a loss to figure out how to do this--any guidance would be highly appreciated! 我对R还是很陌生,很茫然地想出如何做到这一点-任何指导都将不胜感激!

In base R, we can find all the unique so_category ( all_category ) with which we need to match. 在基数R中,我们可以找到所有需要匹配的unique so_categoryall_category )。 Split the claims on ; 分割claims ; and match each one of them with the code in lookup and get the corresponding so_category and give 1/0 values based on presence/absence of the category in all_category . 并将它们match每个与lookup code match ,并获得相应的so_category并根据all_category类别的存在/不存在给出1/0值。

all_category <- unique(lookup$so_category)

data[all_category] <- t(sapply(strsplit(data$claims, ";"), function(x)
          as.integer(all_category %in% lookup$so_category[match(x, lookup$code)])))

data
#  Case                     claims f_statute st_statute common_law
#1    1              wiretap;fdcpa         1          0          1
#2    2              ca_ucl;comlaw         0          1          1
#3    3 tort;comlaw;wiretap;ca_ucl         1          1          1

data 数据

data <- structure(list(Case = 1:3, claims = c("wiretap;fdcpa", 
"ca_ucl;comlaw", "tort;comlaw;wiretap;ca_ucl")), 
row.names = c(NA, -3L), class = "data.frame")

lookup <- structure(list(code = c("wiretap", "fdcpa", "ca_ucl", "comlaw", 
"tort"), category = c("f_wiretap", "f_con_prot", "st_con_prot", 
"com_law", "com_law"), so_category = c("f_statute", "f_statute", 
"st_statute", "common_law", "common_law")), row.names = c(NA, 
-5L), class = "data.frame")

Here is an option with tidyverse , where we split the 'claims' column at the delimiter ; 这是tidyverse一个选项,我们在定界符处拆分“ claims”列; with separate_rows , then do a join ( left_join ) with the 'lookup' dataset, spread it to 'wide' format after getting the distinct rows and join the output with the original dataset separate_rows ,然后做一个连接( left_join )与“查找”数据集, spread得到它后“宽”格式distinct行,并加入与原始数据集输出

library(tidyverse)
data %>% 
  separate_rows(claims, sep=";") %>%
  left_join(lookup, by = c("claims" = "code")) %>%
  select(-claims, -category) %>%
  distinct(Case, so_category) %>% 
  mutate(val = 1) %>% 
  spread(so_category, val, fill = 0) %>% 
  right_join(data) %>% 
  select(names(data), everything())
#   Case                     claims common_law f_statute st_statute
#1    1              wiretap;fdcpa          0         1          0
#2    2              ca_ucl;comlaw          1         0          1
#3    3 tort;comlaw;wiretap;ca_ucl          1         1          1

data 数据

data <- structure(list(Case = 1:3, claims = c("wiretap;fdcpa", 
"ca_ucl;comlaw", "tort;comlaw;wiretap;ca_ucl")), 
row.names = c(NA, -3L), class = "data.frame")

lookup <- structure(list(code = c("wiretap", "fdcpa", "ca_ucl", "comlaw", 
"tort"), category = c("f_wiretap", "f_con_prot", "st_con_prot", 
"com_law", "com_law"), so_category = c("f_statute", "f_statute", 
"st_statute", "common_law", "common_law")), row.names = c(NA, 
-5L), class = "data.frame")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何通过匹配另一个数据框来填充数据框列值? - How to fill in data frame column values by matching another data frame? R:从一个数据框中提取行,基于列名匹配另一个数据框中的值 - R: Extract Rows from One Data Frame, Based on Column Names Matching Values from Another Data Frame R:根据另一列操作一个数据框列的值 - R: Manipulate values of one data frame column based on another column 如何根据另一列中的值将一个数据框中的多列添加到另一个数据框中? - how to add multiple columns from one data frame to another based on values in another column? R 故障排除:根据数据框中另一列中的值对数据框中的一列的值求和 - R Troubleshooting: Sum values of one column in a data frame based on values in another column of the data frame 如何基于R中另一列的值从数据框中的一列提取值 - How to extract values from one column in a data frame based on values in another in R 如何在匹配R中的其他列时将特定值从一个数据列复制到另一个数据列? - How to copy specific values from one data column to another while matching other columns in R? 如何基于匹配R中的其他列的行值来填充列的值 - How do I fill in values for columns based on matching few other column's row values in R 如何根据R中的一列列表将一个数据框中的值汇总到另一个数据框中 - How to sum values from one data frame into another based on a column of lists in R 根据2个匹配的列值将值从一个data.frame添加到另一个 - Adding values from one data.frame to another based on 2 matching column values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM