[英]How to subset a pandas dataframe based on column names of another dataframe that may be in random order?
我想通過common
dataframe 的列名對raw_clin
dataframe 的行名進行子集化。
common
dataframe 示例
common = pd.DataFrame([["PPP1R15A", -0.5880, 1.3980, -0.9402, -0.3741], ["AVPR1A", 1.5472, -0.8588, -0.1703, -0.5198], ["RGR", -0.3225, 0.8372, 0.2006, -0.0271]], columns=['Hugo_Symbol', 'TCGA-02-0010-01', 'TCGA-41-2571-01', 'TCGA-14-1821-01', 'TCGA-32-2632-01'])
raw_clin
dataframe 示例
raw_clin = pd.DataFrame([["TCGA-02-0010-01", "I", "want", "to", "subset"], ["TCGA-14-1821-01", "clin_var", "rownames", "by", "common"], ["TCGA-41-2571-01", "colnames", "where", "the", "latter"], ["TCGA-32-2632-01", "may", "be", "random", "order"]], columns=['PATIENT_ID', 'Something1', 'something2', 'something3', 'something4'])
需要 output
raw_clin = pd.DataFrame([["TCGA-02-0010-01", "I", "want", "to", "subset"], ["TCGA-41-2571-01", "colnames", "where", "the", "latter"], ["TCGA-14-1821-01", "clin_var", "rownames", "by", "common"], ["TCGA-32-2632-01", "may", "be", "random", "order"]], columns=['PATIENT_ID', 'Something1', 'something2', 'something3', 'something4'])
我的嘗試沒有匹配:
raw_clin = raw_clin[raw_clin.index.isin(common.columns)]
如果我理解正確,你提到的行名是索引,那么你需要為 dataframe 使用set_index
。
然后您的代碼將使用raw_clin = raw_clin[raw_clin.index.isin(common.columns)]
創建您想要的 output。
raw_clin = pd.DataFrame([["TCGA-02-0010-01", "I", "want", "to", "subset"], ["TCGA-14-1821-01", "clin_var", "rownames", "by", "common"], ["TCGA-41-2571-01", "colnames", "where", "the", "latter"], ["TCGA-32-2632-01", "may", "be", "random", "order"]], columns=['PATIENT_ID', 'Something1', 'something2', 'something3', 'something4']).set_index('PATIENT_ID')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.