简体   繁体   中英

python pandas: list of integers as individual values of DataFrame

Question: How to 'pd.read_csv' so that the values in a given column are of type list (a list in each row of a column )?

When creating a DataFrame (from a dict, see below), individual values are of type list. The problem: After writing the DataFrame to a file and reading from the file back to a DataFrame, I get a string instead of a list.

creating the DataFrame
type(df.loc[0, 'neg']) == list # --> True
type(df.loc[0, 'neg']) == str # --> False
df.loc[1, 'neg'][-1] == 782 # --> True

value is a list

df = pd.read_csv('DataFrame.txt', sep='\t')
write to file
 df.to_csv('DataFrame.txt', sep='\\t', header=True, index=False) 
read from file
 df = pd.read_csv('DataFrame.txt', sep='\\t') 

value is a string not a list

 type(df.loc[0, 'neg']) == list # --> False type(df.loc[0, 'neg']) == str # --> True df.loc[1, 'neg'][-1] == 782 # --> False 

Of course, it's possible to convert between the two data types, but it's computationally expensive and needs extra work (see below)

 def convert_StringList2ListOfInt(string2convert): return [int(ele) for ele in string2convert[1:-1].split(',')] def DataFrame_StringOfInts2ListOfInts(df, cols2convert_list): for column in cols2convert_list: column_temp = column + "_temp" df[column_temp] = df[column].apply(convert_StringList2ListOfInt, 1) df[column] = df[column_temp] df = df.drop(column_temp, axis=1) return df df = DataFrame_StringOfInts2ListOfInts(df, ['neg', 'pos']) 

What would be a better (more pythonic) solution? It would be very convenient to iterate over the Integers in the list without having to convert them back and forth. Thank you for your support!!

You can use ast.literal_eval() to convert the strings to lists.

A simple example of ast.literal_eval() -

>>> import ast
>>> l = ast.literal_eval('[10,20,30]')
>>> type(l)
<class 'list'>

For your case, you can pass it to Series.apply , so that each element in the series is evaluated (safely). Example -

df = pd.read_csv('DataFrame.txt', sep='\t')
import ast
df['neg_list'] = df['neg'].apply(ast.literal_eval)
df = df.drop('neg',axis=1)
df['pos_list'] = df['pos'].apply(ast.literal_eval)
df = df.drop('pos',axis=1)

Demo -

In [15]: import pandas as pd

In [16]: dict2df = {"euNOG": ["ENOG410IF52", "KOG2956", "KOG1997"],
   ....:            "neg": [[58], [1332, 753, 716, 782], [187]],
   ....:            "pos": [[96], [659, 661, 705, 1228], [1414]]}

In [17]: df = pd.DataFrame(dict2df)

In [18]: df.to_csv('DataFrame.txt', sep='\t', header=True, index=False)

In [19]: newdf = pd.read_csv('DataFrame.txt', sep='\t')

In [20]: newdf['neg']
Out[20]:
0                     [58]
1    [1332, 753, 716, 782]
2                    [187]
Name: neg, dtype: object

In [21]: newdf['neg'][0]
Out[21]: '[58]'

In [22]: import ast

In [23]: newdf['neg_list'] = newdf['neg'].apply(ast.literal_eval)

In [24]: newdf = newdf.drop('neg',axis=1)

In [25]: newdf['pos_list'] = newdf['pos'].apply(ast.literal_eval)

In [26]: newdf = newdf.drop('pos',axis=1)

In [27]: newdf
Out[27]:
         euNOG               neg_list               pos_list
0  ENOG410IF52                   [58]                   [96]
1      KOG2956  [1332, 753, 716, 782]  [659, 661, 705, 1228]
2      KOG1997                  [187]                 [1414]

In [28]: newdf['neg_list'][0]
Out[28]: [58]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM