[英]How to iterate over a list of strings?
The symbols
variable is a list of strings. symbols
变量是一个字符串列表。
symbols = [pathway_genes['geneSymbols'].loc[i] for i in pathway_genes.index if i in M.index]
["['ABCA1', 'ABCA10', 'ABCA12', 'ABCA13']",
"['AKT1', 'AKT2', 'AKT3'],
"['APC', 'APC2', 'AXIN1', 'AXIN2']"]
I want to iterate over each string in symbols
and compare it to common_mrna.index
.我想遍历symbols
中的每个字符串并将其与common_mrna.index
进行比较。 If it matches, I want to record this string and all the subsequent strings in this list.如果匹配,我想记录这个字符串和这个列表中的所有后续字符串。
I then want to subset the common_mrna
dataframe using this list.然后我想使用这个列表对common_mrna
数据框进行子集化。
pathway_genes = pd.DataFrame(data = [['KEGG_ABC_TRANSPORTERS', ['AKT1', 'AKT2', 'AKT3']], ['KEGG_ACUTE_MYELOID_LEUKEMIA', ['ABCA1', 'ABCA10', 'ABCA12']],['KEGG_ADHERENS_JUNCTION', ['ACP1', 'ACTB', 'PLLP']]]
, columns=['pathways', 'geneSymbols']).set_index("pathways")
common_mrna = pd.DataFrame([['AKT3', 0.0045, -1.1018, 0.123], ["PLLP", -0.3716, 0.1846, 0.345], ["AKT2", -0.5576, 0.3558, 0.678]], columns=['Hugo_Symbol', 'TCGA-02-0033-01','TCGA-02-2470-01','TCGA-02-2483-01']).set_index("Hugo_Symbol")
Desired output:期望的输出:
subset = pd.DataFrame([['AKT2', -0.5576, 0.3558], ['AKT3', 0.0045, -1.1018]], columns=['TCGA-02-0033-01','TCGA-02-2470-01','TCGA-02-2483-01'])
I'm unable to iterate over the strings in symbols
.我无法遍历symbols
中的字符串。 The following code prints all the alphabets instead of the string.以下代码打印所有字母而不是字符串。
for i in symbols:
for j in i:
print(j)
How pathway_genes
is generated: pathway_genes
是如何生成的:
G_path = (G.index).to_list()
M_path = (M.index).to_list()
C_path = (C.index).to_list()
combined_omics_path = pd.DataFrame(G_path + M_path + C_path)
combined_omics_path = combined_omics_path.values.tolist()
combined_omics_path = [i for sublist in combined_omics_path for i in sublist]
pathway_genes = pd.DataFrame(gsea_msigdb.loc["geneSymbols"][gsea_msigdb.columns.intersection(combined_omics_path)])
gsea_msigdb
A header一个标题 | KEGG_ABC_TRANSPORTERS KEGG_ABC_TRANSPORTERS | KEGG_ACUTE_MYELOID_LEUKEMIA KEGG_ACUTE_MYELOID_LEUKEMIA | KEGG_ADHERENS_JUNCTION KEGG_ADHERENS_JUNCTION |
---|---|---|---|
systematicName系统名称 | 1 1 | 2 2 | 3 3 |
geneSymbols基因符号 | [ABCA1, ABCA10, ABCA12] |
[AKT1, AKT2, AKT3] |
[ACP1, ACTB, ACTG1] |
If you need the data shown in 'subset '.如果您需要“子集”中显示的数据。 Using the 'np.in1d ' function from numpy, I find a match of indexes from 'common_mrna', in rows from the 'pathway_genes' data frame.使用 numpy 中的“np.in1d”函数,我在“pathway_genes”数据框的行中找到了来自“common_mrna”的索引匹配。 Add in arr list values using indexes that matched.使用匹配的索引添加 arr 列表值。 And under your condition 'PLLP' also suitable.并且在你的条件下'PLLP'也适合。
import numpy as np
import pandas as pd
pathway_genes = pd.DataFrame(data=[['KEGG_ABC_TRANSPORTERS', ['AKT1', 'AKT2', 'AKT3']],
['KEGG_ACUTE_MYELOID_LEUKEMIA', ['ABCA1', 'ABCA10', 'ABCA12']],
['KEGG_ADHERENS_JUNCTION', ['ACP1', 'ACTB', 'PLLP']]]
, columns=['pathways', 'geneSymbols']).set_index("pathways")
common_mrna = pd.DataFrame([['AKT3', 0.0045, -1.1018, 0.123], ["PLLP", -0.3716, 0.1846, 0.345],
["AKT2", -0.5576, 0.3558, 0.678]],
columns=['Hugo_Symbol', 'TCGA-02-0033-01', 'TCGA-02-2470-01', 'TCGA-02-2483-01']
).set_index("Hugo_Symbol")
arr = []
for i in range(0, len(common_mrna)):
index = np.array(pathway_genes.iat[i, 0])[np.in1d(pathway_genes.iat[i, 0], common_mrna.index)]
[arr.append([i, common_mrna.at[i, 'TCGA-02-0033-01'], common_mrna.at[i, 'TCGA-02-2470-01']]) for i in index]
subset = pd.DataFrame(arr, columns=['TCGA-02-0033-01', 'TCGA-02-2470-01', 'TCGA-02-2483-01'])
print(subset)
Output输出
TCGA-02-0033-01 TCGA-02-2470-01 TCGA-02-2483-01
0 AKT2 -0.5576 0.3558
1 AKT3 0.0045 -1.1018
2 PLLP -0.3716 0.1846
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.