简体   繁体   English

根据另一个数据框中的两列选择数据框中的列子集

[英]Selecting subset of columns in a data frame based on two columns from another data frame

I have a large data data set of patient encounters(~6 million). 我有一个大型的数据集患者遭遇(约600万)。 Each patient may have multiple entries each year over multiple years. 每位患者可能在多年中每年都有多次入境。 I would like to be able arrange the patients according to years and then number them so that I can filter out all but one year for each patient so that I can look at each patient for their first year in a particular health plan. 我希望能够按年排列患者,然后给他们编号,以便我可以过滤掉每个患者的除一年以外的所有记录,以便可以在特定的健康计划中对每个患者的第一年进行检查。

I am able to rank and filter out the first entry for each patient however I thought I would have to create a new df and subset original data frame based on the two columns generated in my new data frame using %in%. 我能够对每个患者的第一个条目进行排名和过滤,但是我认为我必须基于在新数据框中使用%in%生成的两列来创建新的df和子集原始数据框。 This is where I am having trouble. 这是我遇到麻烦的地方。

While I use stack overflow frequently to find solutions to my questions, I do not typically post so bear with me if I am not doing it properly. 虽然我经常使用堆栈溢出来查找问题的解决方案,但是我通常不会发布,所以如果我做得不好,请多多包涵。

enrolid<- c(223801,223801, 223801, 223801, 223801, 223803, 223803, 223804)

year<- c(2008, 2008, 2009, 2010, 2011, 2008, 2011, 2008)

service<- c( "CT", "Colonoscopy", "labs", "office_visit", "med", "office_vist", "hospitalization", "CT")

#But for 6 million enounters. I want to me extract the enrolid and first #year for each individual in my data set.


df1<-data.frame(enrolid, year, service)

df2<- df1 %>% 
group_by(enrolid) %>% 
  filter(rank(year, ties.method="first")==1) %>% 
  mutate(enrollment_year_num = 1) %>% 
  select(enrolid, year)`

df1 %>% 
filter_all(any_vars(. %in% df2)) #tried with df2$enrolid & df2year

Thnaks! Thnaks!

You can just do it all in one step with the filter statement (make sure year is a numeric variable for this to work). 您可以使用filter语句一步一步完成所有操作(确保year是一个数字变量,此功能才能起作用)。

df1 %>%
 group_by(enrolid) %>%
 filter(year == min(year))

也可以使用slice

df1 %>% group_by(enrolid) %>% slice(which.min(year))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM