如何基于一列的部分与另一数据框中的值的匹配来填充R中的列

Question

I have two data frames, one with my data ( data ) and one with a lookup table ( lookup ). 我有两个数据帧，一个带有我的数据（ data ），一个带有查询表（ lookup ）。 The data includes a column called claims ; 数据包括一个称为claims的列； its cells are filled with one or more codes identifying the types of legal claims brought in a particular case (each row represents one case). 其单元格中填充着一个或多个代码，用于标识在特定案件中提出的法律要求的类型（每一行代表一个案件）。 Multiple types of claims are separated by a semicolon. 多种类型的索赔用分号分隔。

The lookup data frame has three columns: code , category , and so_category . lookup数据框具有三列： code ， category和so_category 。 The code column lists each unique claim code used in the claims column of data . code列列出了data claims列中使用的每个唯一索赔代码。 category contains a category I assigned to that kind of claim, and so_category assigns a higher-level category into which that particular category fits. category包含我分配给该声明的类别，而so_category分配了适合该特定category的更高级别的类别。

What I'm trying to do is add columns to data for each category and so_category that will just be filled with 0 or 1 depending on whether there are claims in the case that correspond to each category and so_category . 我想要做的就是添加列data为每个category和so_category将只是充满取决于是否有0或1 claims中对应于每个案件category和so_category 。

Below is an example of what my data frames look like: 以下是我的数据框的示例：

data
Case      claims
1         wiretap;fdcpa
2         ca_ucl;comlaw
3         tort;comlaw;wiretap;ca_ucl

lookup
code     category     so_category
wiretap  f_wiretap    f_statute
fdcpa    f_con_prot   f_statute
ca_ucl   st_con_prot  st_statute
comlaw   com_law      common_law
tort     com_law      common_law

So what I would like to generate programmatically is something like: 所以我想以编程方式生成如下内容：

data
Case      claims                      f_stat   st_stat   common_law
1         wiretap;fdcpa               1        0         0
2         ca_ucl;comlaw               0        1         1
3         tort;comlaw;wiretap;ca_ucl  1        1         1

I'm quite new to R and am pretty much at a loss to figure out how to do this--any guidance would be highly appreciated! 我对R还是很陌生，很茫然地想出如何做到这一点-任何指导都将不胜感激！

Answer 1

In base R, we can find all the unique so_category ( all_category ) with which we need to match. 在基数R中，我们可以找到所有需要匹配的unique so_category （ all_category ）。 Split the claims on ; 分割claims ; and match each one of them with the code in lookup and get the corresponding so_category and give 1/0 values based on presence/absence of the category in all_category . 并将它们match每个与lookup code match ，并获得相应的so_category并根据all_category类别的存在/不存在给出1/0值。

all_category <- unique(lookup$so_category)

data[all_category] <- t(sapply(strsplit(data$claims, ";"), function(x)
          as.integer(all_category %in% lookup$so_category[match(x, lookup$code)])))

data
#  Case                     claims f_statute st_statute common_law
#1    1              wiretap;fdcpa         1          0          1
#2    2              ca_ucl;comlaw         0          1          1
#3    3 tort;comlaw;wiretap;ca_ucl         1          1          1

data 数据

data <- structure(list(Case = 1:3, claims = c("wiretap;fdcpa", 
"ca_ucl;comlaw", "tort;comlaw;wiretap;ca_ucl")), 
row.names = c(NA, -3L), class = "data.frame")

lookup <- structure(list(code = c("wiretap", "fdcpa", "ca_ucl", "comlaw", 
"tort"), category = c("f_wiretap", "f_con_prot", "st_con_prot", 
"com_law", "com_law"), so_category = c("f_statute", "f_statute", 
"st_statute", "common_law", "common_law")), row.names = c(NA, 
-5L), class = "data.frame")

Answer 2

Here is an option with tidyverse , where we split the 'claims' column at the delimiter ; 这是tidyverse一个选项，我们在定界符处拆分“ claims”列; with separate_rows , then do a join ( left_join ) with the 'lookup' dataset, spread it to 'wide' format after getting the distinct rows and join the output with the original dataset 与separate_rows ，然后做一个连接（ left_join ）与“查找”数据集， spread得到它后“宽”格式distinct行，并加入与原始数据集输出

library(tidyverse)
data %>% 
  separate_rows(claims, sep=";") %>%
  left_join(lookup, by = c("claims" = "code")) %>%
  select(-claims, -category) %>%
  distinct(Case, so_category) %>% 
  mutate(val = 1) %>% 
  spread(so_category, val, fill = 0) %>% 
  right_join(data) %>% 
  select(names(data), everything())
#   Case                     claims common_law f_statute st_statute
#1    1              wiretap;fdcpa          0         1          0
#2    2              ca_ucl;comlaw          1         0          1
#3    3 tort;comlaw;wiretap;ca_ucl          1         1          1

data 数据

data <- structure(list(Case = 1:3, claims = c("wiretap;fdcpa", 
"ca_ucl;comlaw", "tort;comlaw;wiretap;ca_ucl")), 
row.names = c(NA, -3L), class = "data.frame")

lookup <- structure(list(code = c("wiretap", "fdcpa", "ca_ucl", "comlaw", 
"tort"), category = c("f_wiretap", "f_con_prot", "st_con_prot", 
"com_law", "com_law"), so_category = c("f_statute", "f_statute", 
"st_statute", "common_law", "common_law")), row.names = c(NA, 
-5L), class = "data.frame")

如何基于一列的部分与另一数据框中的值的匹配来填充R中的列

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-06-28 02:01:31

解决方案2
0 2019-06-28 05:28:50

data 数据

如何基于一列的部分与另一数据框中的值的匹配来填充R中的列

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-06-28 02:01:31

解决方案2 0 2019-06-28 05:28:50

data 数据

解决方案1
1 已采纳 2019-06-28 02:01:31

解决方案2
0 2019-06-28 05:28:50