[英]Comparing unordered sets across multiple rows in r
I would like to compare unordered sets of data across multiple rows, identifying the non-matching data from each row into new columns.我想比较多行的无序数据集,将每行中的不匹配数据识别到新列中。 For instance, my data is structured like
例如,我的数据结构如下
org_id <- c("1234", "1234", "1234", "1234", "2345","2345", "2345")
original_value <- c("food", "dental care", "diapers", " ", "care", "housing", "utilities")
new_value <- c("dental care", "emergency food", "diapers", "dental care", "housing", "utilities", "care")
date_change <- c("2018-01-31", "2018-01-31", "2018-01-31", "2018-01-31","2018-01-31", "2018-01-31", "2018-01-31")
df <- data.frame(org_id, original_value,new_value, date_change)
where each row represents a change to an organization's services, and "date_change" refers to the date that the change occurred.其中每一行代表对组织服务的更改,“date_change”指的是更改发生的日期。 You will notice that when you look at the changes associated with the first organization, some of them just represent changes in the order of the listed services and not a change in the services (eg "dental care" for organization "1234").
您会注意到,当您查看与第一个组织相关的更改时,其中一些仅代表所列服务顺序的更改,而不是服务的更改(例如,组织“1234”的“牙科护理”)。 I would like an output that identifies the actual removed values and the actual added values in new columns like the example below:
我想要一个 output 来标识新列中实际删除的值和实际添加的值,如下例所示:
org_id2 <- c("1234", "1234")
removed_value <- c("food", " ")
added_value <- c("emergency food","housing")
date_change2 <- c("2018-01-31","2018-01-31")
df2 <- data.frame(org_id2, removed_value, added_value, date_change2)
Any thoughts on how to approach this problem?关于如何解决这个问题的任何想法? Thanks!
谢谢!
Perhaps something like this:也许是这样的:
df %>%
pivot_longer(2:3) %>%
group_by(org_id,date_change,value) %>%
filter(n()==1 & trimws(value)!="") %>%
pivot_wider(id_cols = org_id:date_change,names_from = name, values_from = value)
Output: Output:
org_id date_change original_value new_value
<chr> <chr> <chr> <chr>
1 1234 2018-01-31 food emergency food
If you want in long format, and want to retain empty string, etc, you can do this:如果你想要长格式,并想保留空字符串等,你可以这样做:
df %>%
pivot_longer(2:3,names_to="action") %>%
group_by(org_id,date_change,value) %>%
filter(n()==1) %>%
mutate(action=if_else(action=="original_value", "removed", "added"))
Output: Output:
org_id date_change action value
<chr> <chr> <chr> <chr>
1 1234 2018-01-31 removed "food"
2 1234 2018-01-31 added "emergency food"
3 1234 2018-01-31 removed " "
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.