简体   繁体   中英

Filtering datafram by list R

I am wanting to filter a dataframe with 1212 rows so it only contains the same samples listed in a different list, the list contains 1097 samples. ie

RNASeq2Norm_samples Substrng_RNASeq2Norm
   1    TCGA-3C-AAAU-01A-11R-A41B-07    TCGA.3C.AAAU
   2    TCGA-3C-AALI-01A-11R-A41B-07    TCGA.3C.AALI
   3    TCGA-3C-AALJ-01A-31R-A41B-07    TCGA.3C.AALJ
   4    TCGA-3C-AALK-01A-11R-A41B-07    TCGA.3C.AALK
   5    TCGA-4H-AAAK-01A-12R-A41B-07    TCGA.4H.AAAK
   6    TCGA-5L-AAT0-01A-12R-A41B-07    TCGA.5L.AAT0
   7    TCGA-5L-AAT1-01A-12R-A41B-07    TCGA.5L.AAT1
   8    TCGA-5T-A9QA-01A-11R-A41B-07    TCGA.5T.A9QA
   .
   .
   .
   1212
intersect_samples: "TCGA.3C.AAAU" "TCGA.3C.AALI" "TCGA.3C.AALJ" "TCGA.3C.AALK" "TCGA.4H.AAAK" "TCGA.5L.AAT0" "TCGA.5L.AAT1" "TCGA.5T.A9QA" "TCGA.A1.A0SB" "TCGA.A1.A0SD" "TCGA.A1.A0SE" "TCGA.A1.A0SF" "TCGA.A1.A0SG" "TCGA.A1.A0SH" "TCGA.A1.A0SI" ... 1097

Let's call your first dataframe as df and your filter-list fl

cleaned_df = df[df$RNASeq2Norm_samples %in% fl, ]

Splitting the command

  • df$RNASeq2Norm_samples %in% fl returns a boolean list ( TRUE / FALSE values). Each item answers the question "can this RNASeq2Norm_samples be found inside fl ?"

  • df[<condition>, ] takes all rows of df where <condition> evaluates TRUE . The empty list after comma means "keep all columns"

Further details on filtering dataframes (and matrices) in R

Finally, I would point out the subset function:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM