简体   繁体   English

在r的子数据集中移动唯一变量

[英]move unique variables in sub data sets in r

I have a data set in R looking as following: 我在R中有一个数据集,如下所示:

Member ID  Listing ID ...
1          111
1          111
1          112
2          113
2          114
3          115
...

My goal is to split the original data and create sub data sets where there are no "Member IDs" that have multiple "Listing IDs". 我的目标是拆分原始数据并创建子数据集,其中不存在具有多个“列表ID”的“成员ID”。 However duplicates (such as the case of Member ID 1 with Listing ID 111) should not be deleted and should remain. 但是,重复项(例如,成员ID为1且列表ID为111的情况)不应删除,而应保留。

In this example: 在此示例中:

Data set 1: 数据集1:

Member ID  Listing ID
1          111
1          111
2          113
3          115

Data set 2: 数据集2:

Member ID  Listing ID
1          112
2          114

My data set is much larger and the final output would likely be around a 100 sub data sets. 我的数据集要大得多,最终输出可能约为100个子数据集。

Can you please help me with that? 你能帮我吗?

Many thanks! 非常感谢!

We can create a run length ID for each Member ID . 我们可以为每个Member ID创建一个运行长度Member ID After that, split the data frame by run length ID. 之后,按运行长度ID分割数据帧。 In the following example, the final outputs are all in dt_list2 . 在以下示例中,最终输出全部在dt_list2

# Load packages
library(dplyr)
library(data.table)

# Create example data frame
dt <- read.table(text = "'Member ID'  'Listing ID'
                 1          111
                 1          111
                 1          112
                 2          113
                 2          114
                 3          115", 
                 header = TRUE, stringsAsFactors = FALSE)

# Add run length ID
dt2 <- dt %>%
  setNames(nm = c("Member ID", "Listing ID")) %>%
  group_by(`Member ID`) %>%
  mutate(RL = rleid(`Listing ID`))

# Split the data frame by run length ID
dt_list <- split(dt2, f = dt2$RL)

# Remove the run length ID for each data frame
dt_list2 <- lapply(dt_list, function(dt){
  dt$RL <- NULL
  return(dt)
})

I think this will do it: 我认为这可以做到:

split(dt, (duplicated(dt) | duplicated(dt,fromLast=TRUE)) | (!duplicated(dt$Member.ID)))

#$`FALSE`
#  Member.ID Listing.ID
#3         1        112
#5         2        114
#
#$`TRUE`
#  Member.ID Listing.ID
#1         1        111
#2         1        111
#4         2        113
#6         3        115

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM