[英]Python remove outer quotes in a list of lists made from a data frame column
I have a pandas data frame called positive_samples that has a column called Gene Class, which is basically a pair of genes stored as a list.我有一个名为 positive_samples 的 Pandas 数据框,它有一个名为 Gene Class 的列,它基本上是一对存储为列表的基因。 It looks like below
它看起来像下面
The entire data frame looks like this.整个数据框看起来像这样。
So the gene class column is just the other two columns in the data frame combined.所以基因类列只是数据框中其他两列的组合。 I made a list using the gene class column like below.
我使用如下所示的基因类列制作了一个列表。 This take all the gene pair lists and make them into a single list.
这将获取所有基因对列表并将它们整合到一个列表中。
#convert the column to a list
postive_gene_pairs = positive_samples["Gene Class"].tolist()
This is the output.这是输出。
Each pair is now wrapped within double quotes, which I dont want because I loop through this list and use .loc method to locate this pairs in another data frame called new_expression which has them as an index like this现在每对都用双引号括起来,这是我不想要的,因为我遍历这个列表并使用 .loc 方法在另一个名为 new_expression 的数据框中定位这些对,该数据帧将它们作为这样的索引
for positive_gene_pair in positive_gene_pairs:
print(new_expression_df.loc[[positive_gene_pair],"GSM144819"])
This throws a keyerror.这会引发一个关键错误。
And it definely because of the extra quotes that each pair is wrapped around because when I instantiate a list like below without quotes it works just fine.并且这肯定是因为每对都包含额外的引号,因为当我实例化一个如下所示的列表时,没有引号它工作得很好。
So my question is how do I remove the extra quotes to make this work with .loc?所以我的问题是如何删除额外的引号以使其与 .loc 一起使用? To make a list just like below, but from a data frame column?.
要创建一个如下所示的列表,但是来自数据框列?。
pairs = [['YAL013W','YBR103W'],['YAL011W','YMR263W']]
I tried so many workarounds like replace, strip but none of them worked for me as ideally they would work for strings but I was trying to make them work on a list, any easy solution?我尝试了很多解决方法,如替换、剥离,但没有一个对我有用,理想情况下它们适用于字符串,但我试图让它们在列表上工作,有什么简单的解决方案吗? I just want to have a list of list like this pairs list that does not have extra single or double quotes.
我只想有一个像这个对列表这样的列表,它没有额外的单引号或双引号。
define a functio:定义一个函数:
def listup(initlist):
# Converting string to list
res = ini_list.strip('][').split(', ')
return res
change from从改变
postive_gene_pairs = positive_samples["Gene Class"].tolist()
to到
postive_gene_pairs = positive_samples["Gene Class"].apply(listup).tolist()
Convert list of strings to lists first:首先将字符串列表转换为列表:
import ast
postive_gene_pairs = positive_samples["Gene Class"].apply(ast.literal_eval).tolist()
And then remove []
:然后删除
[]
:
for positive_gene_pair in positive_gene_pairs:
print(new_expression_df.loc[[positive_gene_pair],"GSM144819"])
to:到:
for positive_gene_pair in positive_gene_pairs:
print(new_expression_df.loc[positive_gene_pair,"GSM144819"])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.