简体   繁体   English

比较 r 中多行的无序集

[英]Comparing unordered sets across multiple rows in r

I would like to compare unordered sets of data across multiple rows, identifying the non-matching data from each row into new columns.我想比较多行的无序数据集,将每行中的不匹配数据识别到新列中。 For instance, my data is structured like例如,我的数据结构如下

org_id <- c("1234", "1234", "1234", "1234", "2345","2345", "2345")
original_value <- c("food", "dental care", "diapers", " ", "care", "housing", "utilities")
new_value <- c("dental care", "emergency food", "diapers", "dental care", "housing", "utilities", "care")
date_change <- c("2018-01-31", "2018-01-31", "2018-01-31", "2018-01-31","2018-01-31", "2018-01-31", "2018-01-31")
df <- data.frame(org_id, original_value,new_value, date_change)

where each row represents a change to an organization's services, and "date_change" refers to the date that the change occurred.其中每一行代表对组织服务的更改,“date_change”指的是更改发生的日期。 You will notice that when you look at the changes associated with the first organization, some of them just represent changes in the order of the listed services and not a change in the services (eg "dental care" for organization "1234").您会注意到,当您查看与第一个组织相关的更改时,其中一些仅代表所列服务顺序的更改,而不是服务的更改(例如,组织“1234”的“牙科护理”)。 I would like an output that identifies the actual removed values and the actual added values in new columns like the example below:我想要一个 output 来标识新列中实际删除的值和实际添加的值,如下例所示:

org_id2 <- c("1234", "1234")
removed_value <- c("food", " ")
added_value <- c("emergency food","housing")
date_change2 <- c("2018-01-31","2018-01-31")
df2 <- data.frame(org_id2, removed_value, added_value, date_change2)

Any thoughts on how to approach this problem?关于如何解决这个问题的任何想法? Thanks!谢谢!

Perhaps something like this:也许是这样的:

df %>% 
  pivot_longer(2:3) %>% 
  group_by(org_id,date_change,value) %>% 
  filter(n()==1 & trimws(value)!="") %>% 
  pivot_wider(id_cols = org_id:date_change,names_from = name, values_from = value)

Output: Output:

  org_id date_change original_value new_value     
  <chr>  <chr>       <chr>          <chr>         
1 1234   2018-01-31  food           emergency food

If you want in long format, and want to retain empty string, etc, you can do this:如果你想要长格式,并想保留空字符串等,你可以这样做:

df %>% 
  pivot_longer(2:3,names_to="action") %>% 
  group_by(org_id,date_change,value) %>% 
  filter(n()==1) %>% 
  mutate(action=if_else(action=="original_value", "removed", "added"))

Output: Output:

  org_id date_change action  value           
  <chr>  <chr>       <chr>   <chr>           
1 1234   2018-01-31  removed "food"          
2 1234   2018-01-31  added   "emergency food"
3 1234   2018-01-31  removed " "             

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM