简体   繁体   English

Python 删除由数据框列组成的列表列表中的外部引号

[英]Python remove outer quotes in a list of lists made from a data frame column

I have a pandas data frame called positive_samples that has a column called Gene Class, which is basically a pair of genes stored as a list.我有一个名为 positive_samples 的 Pandas 数据框,它有一个名为 Gene Class 的列,它基本上是一对存储为列表的基因。 It looks like below它看起来像下面

在此处输入图片说明

The entire data frame looks like this.整个数据框看起来像这样。

在此处输入图片说明 . .

So the gene class column is just the other two columns in the data frame combined.所以基因类列只是数据框中其他两列的组合。 I made a list using the gene class column like below.我使用如下所示的基因类列制作了一个列表。 This take all the gene pair lists and make them into a single list.这将获取所有基因对列表并将它们整合到一个列表中。

   #convert the column to a list
   postive_gene_pairs = positive_samples["Gene Class"].tolist()

This is the output.这是输出。

在此处输入图片说明

Each pair is now wrapped within double quotes, which I dont want because I loop through this list and use .loc method to locate this pairs in another data frame called new_expression which has them as an index like this现在每对都用双引号括起来,这是我不想要的,因为我遍历这个列表并使用 .loc 方法在另一个名为 new_expression 的数据框中定位这些对,该数据帧将它们作为这样的索引

在此处输入图片说明

for positive_gene_pair in positive_gene_pairs:
    print(new_expression_df.loc[[positive_gene_pair],"GSM144819"])

This throws a keyerror.这会引发一个关键错误。

在此处输入图片说明

And it definely because of the extra quotes that each pair is wrapped around because when I instantiate a list like below without quotes it works just fine.并且这肯定是因为每对都包含额外的引号,因为当我实例化一个如下所示的列表时,没有引号它工作得很好。

在此处输入图片说明

So my question is how do I remove the extra quotes to make this work with .loc?所以我的问题是如何删除额外的引号以使其与 .loc 一起使用? To make a list just like below, but from a data frame column?.要创建一个如下所示的列表,但是来自数据框列?。

pairs = [['YAL013W','YBR103W'],['YAL011W','YMR263W']]

I tried so many workarounds like replace, strip but none of them worked for me as ideally they would work for strings but I was trying to make them work on a list, any easy solution?我尝试了很多解决方法,如替换、剥离,但没有一个对我有用,理想情况下它们适用于字符串,但我试图让它们在列表上工作,有什么简单的解决方案吗? I just want to have a list of list like this pairs list that does not have extra single or double quotes.我只想有一个像这个对列表这样的列表,它没有额外的单引号或双引号。

define a functio:定义一个函数:

def listup(initlist):
    # Converting string to list 
    res = ini_list.strip('][').split(', ') 
    
    return res

change from从改变

postive_gene_pairs = positive_samples["Gene Class"].tolist()

to

postive_gene_pairs = positive_samples["Gene Class"].apply(listup).tolist()

Convert list of strings to lists first:首先将字符串列表转换为列表:

import ast

postive_gene_pairs = positive_samples["Gene Class"].apply(ast.literal_eval).tolist()

And then remove [] :然后删除[]

for positive_gene_pair in positive_gene_pairs:
    print(new_expression_df.loc[[positive_gene_pair],"GSM144819"])

to:到:

for positive_gene_pair in positive_gene_pairs:
    print(new_expression_df.loc[positive_gene_pair,"GSM144819"])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM