简体   繁体   中英

move unique variables in sub data sets in r

I have a data set in R looking as following:

Member ID  Listing ID ...
1          111
1          111
1          112
2          113
2          114
3          115
...

My goal is to split the original data and create sub data sets where there are no "Member IDs" that have multiple "Listing IDs". However duplicates (such as the case of Member ID 1 with Listing ID 111) should not be deleted and should remain.

In this example:

Data set 1:

Member ID  Listing ID
1          111
1          111
2          113
3          115

Data set 2:

Member ID  Listing ID
1          112
2          114

My data set is much larger and the final output would likely be around a 100 sub data sets.

Can you please help me with that?

Many thanks!

We can create a run length ID for each Member ID . After that, split the data frame by run length ID. In the following example, the final outputs are all in dt_list2 .

# Load packages
library(dplyr)
library(data.table)

# Create example data frame
dt <- read.table(text = "'Member ID'  'Listing ID'
                 1          111
                 1          111
                 1          112
                 2          113
                 2          114
                 3          115", 
                 header = TRUE, stringsAsFactors = FALSE)

# Add run length ID
dt2 <- dt %>%
  setNames(nm = c("Member ID", "Listing ID")) %>%
  group_by(`Member ID`) %>%
  mutate(RL = rleid(`Listing ID`))

# Split the data frame by run length ID
dt_list <- split(dt2, f = dt2$RL)

# Remove the run length ID for each data frame
dt_list2 <- lapply(dt_list, function(dt){
  dt$RL <- NULL
  return(dt)
})

I think this will do it:

split(dt, (duplicated(dt) | duplicated(dt,fromLast=TRUE)) | (!duplicated(dt$Member.ID)))

#$`FALSE`
#  Member.ID Listing.ID
#3         1        112
#5         2        114
#
#$`TRUE`
#  Member.ID Listing.ID
#1         1        111
#2         1        111
#4         2        113
#6         3        115

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM