简体   繁体   中英

Python remove outer quotes in a list of lists made from a data frame column

I have a pandas data frame called positive_samples that has a column called Gene Class, which is basically a pair of genes stored as a list. It looks like below

在此处输入图片说明

The entire data frame looks like this.

在此处输入图片说明 .

So the gene class column is just the other two columns in the data frame combined. I made a list using the gene class column like below. This take all the gene pair lists and make them into a single list.

   #convert the column to a list
   postive_gene_pairs = positive_samples["Gene Class"].tolist()

This is the output.

在此处输入图片说明

Each pair is now wrapped within double quotes, which I dont want because I loop through this list and use .loc method to locate this pairs in another data frame called new_expression which has them as an index like this

在此处输入图片说明

for positive_gene_pair in positive_gene_pairs:
    print(new_expression_df.loc[[positive_gene_pair],"GSM144819"])

This throws a keyerror.

在此处输入图片说明

And it definely because of the extra quotes that each pair is wrapped around because when I instantiate a list like below without quotes it works just fine.

在此处输入图片说明

So my question is how do I remove the extra quotes to make this work with .loc? To make a list just like below, but from a data frame column?.

pairs = [['YAL013W','YBR103W'],['YAL011W','YMR263W']]

I tried so many workarounds like replace, strip but none of them worked for me as ideally they would work for strings but I was trying to make them work on a list, any easy solution? I just want to have a list of list like this pairs list that does not have extra single or double quotes.

define a functio:

def listup(initlist):
    # Converting string to list 
    res = ini_list.strip('][').split(', ') 
    
    return res

change from

postive_gene_pairs = positive_samples["Gene Class"].tolist()

to

postive_gene_pairs = positive_samples["Gene Class"].apply(listup).tolist()

Convert list of strings to lists first:

import ast

postive_gene_pairs = positive_samples["Gene Class"].apply(ast.literal_eval).tolist()

And then remove [] :

for positive_gene_pair in positive_gene_pairs:
    print(new_expression_df.loc[[positive_gene_pair],"GSM144819"])

to:

for positive_gene_pair in positive_gene_pairs:
    print(new_expression_df.loc[positive_gene_pair,"GSM144819"])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM