使用pandas从csv文件中读回元组

Question

Using pandas, I have exported to a csv file a dataframe whose cells contain tuples of strings. 使用pandas，我已经向csv文件导出了一个数据帧，其单元格包含字符串元组。 The resulting file has the following structure: 生成的文件具有以下结构：

index,colA
1,"('a','b')"
2,"('c','d')"

Now I want to read it back using read_csv. 现在我想用read_csv读回来。 However whatever I try, pandas interprets the values as strings rather than tuples. 无论我尝试什么，pandas都会将值解释为字符串而不是元组。 For instance: 例如：

In []: import pandas as pd
       df = pd.read_csv('test',index_col='index',dtype={'colA':tuple})
       df.loc[1,'colA']
Out[]: "('a','b')"

Is there a way of telling pandas to do the right thing? 有没有办法告诉大熊猫做正确的事情？ Preferably without heavy post-processing of the dataframe: the actual table has 5000 rows and 2500 columns. 优选地，不对数据帧进行大量后处理：实际表具有5000行和2500列。

Answer 1

Storing tuples in a column isn't usually a good idea; 将元组存储在列中通常不是一个好主意; a lot of the advantages of using Series and DataFrames are lost. 使用Series和DataFrames的许多优点都会丢失。 That said, you could use converters to post-process the string: 也就是说，您可以使用converters对字符串进行后处理：

>>> df = pd.read_csv("sillytup.csv", converters={"colA": ast.literal_eval})
>>> df
   index    colA
0      1  (a, b)
1      2  (c, d)

[2 rows x 2 columns]
>>> df.colA.iloc[0]
('a', 'b')
>>> type(df.colA.iloc[0])
<type 'tuple'>

But I'd probably change things at source to avoid storing tuples in the first place. 但我可能会在源头改变一切，以避免首先存储元组。

使用pandas从csv文件中读回元组

问题描述

1 个解决方案

解决方案1
11 已采纳 2014-05-14 18:02:29

使用pandas从csv文件中读回元组

问题描述

1 个解决方案

解决方案1 11 已采纳 2014-05-14 18:02:29

解决方案1
11 已采纳 2014-05-14 18:02:29