比较 r 中多行的无序集

Question

I would like to compare unordered sets of data across multiple rows, identifying the non-matching data from each row into new columns.我想比较多行的无序数据集，将每行中的不匹配数据识别到新列中。 For instance, my data is structured like例如，我的数据结构如下

org_id <- c("1234", "1234", "1234", "1234", "2345","2345", "2345")
original_value <- c("food", "dental care", "diapers", " ", "care", "housing", "utilities")
new_value <- c("dental care", "emergency food", "diapers", "dental care", "housing", "utilities", "care")
date_change <- c("2018-01-31", "2018-01-31", "2018-01-31", "2018-01-31","2018-01-31", "2018-01-31", "2018-01-31")
df <- data.frame(org_id, original_value,new_value, date_change)

where each row represents a change to an organization's services, and "date_change" refers to the date that the change occurred.其中每一行代表对组织服务的更改，“date_change”指的是更改发生的日期。 You will notice that when you look at the changes associated with the first organization, some of them just represent changes in the order of the listed services and not a change in the services (eg "dental care" for organization "1234").您会注意到，当您查看与第一个组织相关的更改时，其中一些仅代表所列服务顺序的更改，而不是服务的更改（例如，组织“1234”的“牙科护理”）。 I would like an output that identifies the actual removed values and the actual added values in new columns like the example below:我想要一个 output 来标识新列中实际删除的值和实际添加的值，如下例所示：

org_id2 <- c("1234", "1234")
removed_value <- c("food", " ")
added_value <- c("emergency food","housing")
date_change2 <- c("2018-01-31","2018-01-31")
df2 <- data.frame(org_id2, removed_value, added_value, date_change2)

Any thoughts on how to approach this problem?关于如何解决这个问题的任何想法？ Thanks!谢谢！

Answer 1

Perhaps something like this:也许是这样的：

df %>% 
  pivot_longer(2:3) %>% 
  group_by(org_id,date_change,value) %>% 
  filter(n()==1 & trimws(value)!="") %>% 
  pivot_wider(id_cols = org_id:date_change,names_from = name, values_from = value)

Output: Output：

  org_id date_change original_value new_value     
  <chr>  <chr>       <chr>          <chr>         
1 1234   2018-01-31  food           emergency food

If you want in long format, and want to retain empty string, etc, you can do this:如果你想要长格式，并想保留空字符串等，你可以这样做：

df %>% 
  pivot_longer(2:3,names_to="action") %>% 
  group_by(org_id,date_change,value) %>% 
  filter(n()==1) %>% 
  mutate(action=if_else(action=="original_value", "removed", "added"))

Output: Output：

  org_id date_change action  value           
  <chr>  <chr>       <chr>   <chr>           
1 1234   2018-01-31  removed "food"          
2 1234   2018-01-31  added   "emergency food"
3 1234   2018-01-31  removed " "

比较 r 中多行的无序集

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-05-17 19:35:18

比较 r 中多行的无序集

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-05-17 19:35:18

解决方案1
0 已采纳 2022-05-17 19:35:18