简体   繁体   English

熊猫数据框读取numpy数组列为str

[英]Pandas dataframe reading numpy array column as str

I have two Python scripts, one that creates a .csv file and the other one that reads it. 我有两个Python脚本,一个用于创建.csv文件,另一个用于读取文件。

This is how I save the dataframe in the first file: 这就是我将数据框保存在第一个文件中的方式:

df['matrix'] = df['matrix'].apply(lambda x: np.array(x))
df.to_csv("Matrices.csv", sep=",", index=False)

The type and shape of df['matrix'].iloc[0] is <class 'numpy.ndarray'> and (24, 60) respectively. df['matrix'].iloc[0]的类型和形状分别为<class 'numpy.ndarray'>(24, 60) <class 'numpy.ndarray'> (24, 60)

In the second script when I try 在第二个脚本中,当我尝试

print ("type of df['matrix'].iloc[0]", type(df['matrix'].iloc[0]))

The output is type of df['matrix'].iloc[0] <class 'str'> 输出为type of df['matrix'].iloc[0] <class 'str'>

How can I make sure that df['matrix'] doesn't loose its nature? 如何确定df['matrix']不会失去其性质?

If want save and read only numpy array use savetxt and genfromtxt . 如果要保存并仅读取numpy数组,请使用savetxtgenfromtxt


If there are multiple columns then use: 如果有多个列,请使用:

Use pickle : 使用泡菜

df.to_pickle('file.pkl')
df = pd.read_pickle('file.pkl')

Convert arrays to multiple columns and then write to file: 将数组转换为多列,然后写入文件:

a = np.array(
[[219,220,221],
 [154,152,14],
 [205,202,192]])

df = pd.DataFrame({'matrix':a.tolist(), 'b':np.arange(len(a))})
print (df)
            matrix  b
0  [219, 220, 221]  0
1   [154, 152, 14]  1
2  [205, 202, 192]  2

df1 = pd.DataFrame(df.pop('matrix').values.tolist(), index=df.index).add_prefix('mat_')
print (df1)
   mat_0  mat_1  mat_2
0    219    220    221
1    154    152     14
2    205    202    192

df = df.join(df1)
print (df)
   b  mat_0  mat_1  mat_2
0  0    219    220    221
1  1    154    152     14
2  2    205    202    192

But if really need to convert values to array need converter with ast.literal_eval : 但是,如果真的需要将值转换为array需要使用ast.literal_eval转换器:

import ast

df.to_csv('testing.csv', index=False)

df = pd.read_csv('testing.csv', converters={'matrix':lambda x: np.array(ast.literal_eval(x))})
print (type(df.loc[0, 'matrix']))

<class 'numpy.ndarray'>

For saving arrays directly to csv as multiple columns use: 要将数组作为多列直接保存到csv中,请使用:

np.savetxt(r'C:\path\file.csv',a,delimiter=',')

If you need to read back as a python object, ast.literal_eval() is your saviour as pointed by @jezrael 如果您需要以python对象的形式读取,则ast.literal_eval()是@jezrael指出的救星

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM