I am wanting to filter a dataframe with 1212 rows so it only contains the same samples listed in a different list, the list contains 1097 samples. ie
RNASeq2Norm_samples Substrng_RNASeq2Norm
1 TCGA-3C-AAAU-01A-11R-A41B-07 TCGA.3C.AAAU
2 TCGA-3C-AALI-01A-11R-A41B-07 TCGA.3C.AALI
3 TCGA-3C-AALJ-01A-31R-A41B-07 TCGA.3C.AALJ
4 TCGA-3C-AALK-01A-11R-A41B-07 TCGA.3C.AALK
5 TCGA-4H-AAAK-01A-12R-A41B-07 TCGA.4H.AAAK
6 TCGA-5L-AAT0-01A-12R-A41B-07 TCGA.5L.AAT0
7 TCGA-5L-AAT1-01A-12R-A41B-07 TCGA.5L.AAT1
8 TCGA-5T-A9QA-01A-11R-A41B-07 TCGA.5T.A9QA
.
.
.
1212
intersect_samples: "TCGA.3C.AAAU" "TCGA.3C.AALI" "TCGA.3C.AALJ" "TCGA.3C.AALK" "TCGA.4H.AAAK" "TCGA.5L.AAT0" "TCGA.5L.AAT1" "TCGA.5T.A9QA" "TCGA.A1.A0SB" "TCGA.A1.A0SD" "TCGA.A1.A0SE" "TCGA.A1.A0SF" "TCGA.A1.A0SG" "TCGA.A1.A0SH" "TCGA.A1.A0SI" ... 1097
Let's call your first dataframe as df
and your filter-list fl
cleaned_df = df[df$RNASeq2Norm_samples %in% fl, ]
Splitting the command
df$RNASeq2Norm_samples %in% fl
returns a boolean list ( TRUE
/ FALSE
values). Each item answers the question "can this RNASeq2Norm_samples
be found inside fl
?"
df[<condition>, ]
takes all rows of df
where <condition>
evaluates TRUE
. The empty list after comma means "keep all columns"
Further details on filtering dataframes (and matrices) in R
Finally, I would point out the subset
function:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.