如何遍历字符串列表？

Question

The symbols variable is a list of strings. symbols变量是一个字符串列表。

symbols = [pathway_genes['geneSymbols'].loc[i] for i in pathway_genes.index if i in M.index]

["['ABCA1', 'ABCA10', 'ABCA12', 'ABCA13']",
"['AKT1', 'AKT2', 'AKT3'],
"['APC', 'APC2', 'AXIN1', 'AXIN2']"]

I want to iterate over each string in symbols and compare it to common_mrna.index .我想遍历symbols中的每个字符串并将其与common_mrna.index进行比较。 If it matches, I want to record this string and all the subsequent strings in this list.如果匹配，我想记录这个字符串和这个列表中的所有后续字符串。

I then want to subset the common_mrna dataframe using this list.然后我想使用这个列表对common_mrna数据框进行子集化。

pathway_genes = pd.DataFrame(data = [['KEGG_ABC_TRANSPORTERS', ['AKT1', 'AKT2', 'AKT3']], ['KEGG_ACUTE_MYELOID_LEUKEMIA', ['ABCA1', 'ABCA10', 'ABCA12']],['KEGG_ADHERENS_JUNCTION', ['ACP1', 'ACTB', 'PLLP']]]
, columns=['pathways', 'geneSymbols']).set_index("pathways")

common_mrna = pd.DataFrame([['AKT3', 0.0045, -1.1018, 0.123], ["PLLP", -0.3716, 0.1846, 0.345], ["AKT2", -0.5576, 0.3558, 0.678]], columns=['Hugo_Symbol', 'TCGA-02-0033-01','TCGA-02-2470-01','TCGA-02-2483-01']).set_index("Hugo_Symbol")

Desired output:期望的输出：

subset = pd.DataFrame([['AKT2', -0.5576, 0.3558], ['AKT3', 0.0045, -1.1018]], columns=['TCGA-02-0033-01','TCGA-02-2470-01','TCGA-02-2483-01'])

I'm unable to iterate over the strings in symbols .我无法遍历symbols中的字符串。 The following code prints all the alphabets instead of the string.以下代码打印所有字母而不是字符串。

for i in symbols:
  for j in i:
    print(j)

How pathway_genes is generated: pathway_genes是如何生成的：

G_path = (G.index).to_list()
M_path = (M.index).to_list()
C_path = (C.index).to_list()
combined_omics_path = pd.DataFrame(G_path + M_path + C_path)
combined_omics_path = combined_omics_path.values.tolist()
combined_omics_path = [i for sublist in combined_omics_path for i in sublist]

pathway_genes = pd.DataFrame(gsea_msigdb.loc["geneSymbols"][gsea_msigdb.columns.intersection(combined_omics_path)])

gsea_msigdb

A header一个标题	KEGG_ABC_TRANSPORTERS KEGG_ABC_TRANSPORTERS	KEGG_ACUTE_MYELOID_LEUKEMIA KEGG_ACUTE_MYELOID_LEUKEMIA	KEGG_ADHERENS_JUNCTION KEGG_ADHERENS_JUNCTION
systematicName系统名称	1 1	2 2	3 3
geneSymbols基因符号	`[ABCA1, ABCA10, ABCA12]`	`[AKT1, AKT2, AKT3]`	`[ACP1, ACTB, ACTG1]`

Answer 1

If you need the data shown in 'subset '.如果您需要“子集”中显示的数据。 Using the 'np.in1d ' function from numpy, I find a match of indexes from 'common_mrna', in rows from the 'pathway_genes' data frame.使用 numpy 中的“np.in1d”函数，我在“pathway_genes”数据框的行中找到了来自“common_mrna”的索引匹配。 Add in arr list values using indexes that matched.使用匹配的索引添加 arr 列表值。 And under your condition 'PLLP' also suitable.并且在你的条件下'PLLP'也适合。

import numpy as np
import pandas as pd

pathway_genes = pd.DataFrame(data=[['KEGG_ABC_TRANSPORTERS', ['AKT1', 'AKT2', 'AKT3']],
                                   ['KEGG_ACUTE_MYELOID_LEUKEMIA', ['ABCA1', 'ABCA10', 'ABCA12']],
                                   ['KEGG_ADHERENS_JUNCTION', ['ACP1', 'ACTB', 'PLLP']]]
                             , columns=['pathways', 'geneSymbols']).set_index("pathways")

common_mrna = pd.DataFrame([['AKT3', 0.0045, -1.1018, 0.123], ["PLLP", -0.3716, 0.1846, 0.345],
                            ["AKT2", -0.5576, 0.3558, 0.678]],
                           columns=['Hugo_Symbol', 'TCGA-02-0033-01', 'TCGA-02-2470-01', 'TCGA-02-2483-01']
                           ).set_index("Hugo_Symbol")

arr = []

for i in range(0, len(common_mrna)):
    index = np.array(pathway_genes.iat[i, 0])[np.in1d(pathway_genes.iat[i, 0], common_mrna.index)]
    [arr.append([i, common_mrna.at[i, 'TCGA-02-0033-01'], common_mrna.at[i, 'TCGA-02-2470-01']]) for i in index]


subset = pd.DataFrame(arr, columns=['TCGA-02-0033-01', 'TCGA-02-2470-01', 'TCGA-02-2483-01'])
print(subset)

Output输出

  TCGA-02-0033-01  TCGA-02-2470-01  TCGA-02-2483-01
0            AKT2          -0.5576           0.3558
1            AKT3           0.0045          -1.1018
2            PLLP          -0.3716           0.1846

如何遍历字符串列表？

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-05-23 16:12:22

如何遍历字符串列表？

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-05-23 16:12:22

解决方案1
0 已采纳 2022-05-23 16:12:22