[英]Read Pandas dataframe from csv file and convert to Python types
I want to read a Pandas dataframe with elements of particular python types, such as arrays and dictionaries, and numpy arrays. I want to read a Pandas dataframe with elements of particular python types, such as arrays and dictionaries, and numpy arrays. I want to read it such that I can immediately work with them (now they are read as a string).我想阅读它以便我可以立即使用它们(现在它们被读取为字符串)。 How do I do that?我怎么做?
I want functionality similar to ast.literal_eval , but hopefully there is a way to do it without looping over the whole dataframe.我想要类似于ast.literal_eval的功能,但希望有一种方法可以做到这一点,而无需遍历整个 dataframe。
Edit: as requested, a minimal reproducible example.编辑:根据要求,一个最小的可重现示例。
import pandas as pd
import numpy as np
output = pd.DataFrame()
data = {'integer':1, 'list': [1,2,3], 'dictionary':{}, 'np_arrar' = np.array([1,2,3]}
output = output.append({}, ignore_index=True)
filename = 'data.csv'
output.to_csv(filename)
input_data = pd.read_csv(filename, ???) # What to do here?
Ideally, I want a way where I don't have to input the datatypes manually (not sure if there is such approach).理想情况下,我想要一种不必手动输入数据类型的方法(不确定是否有这种方法)。
For people of the future: for simple data types it is possible to use the dtype
parameter, like so对于未来的人:对于简单的数据类型,可以使用dtype
参数,就像这样
input_data = pd.read_csv(filename, dtype = {'integer':'int'})
However, for objects, this does not work properly.但是,对于对象,这不能正常工作。 Then you can use the converters
parameter instead.然后您可以改用converters
参数。 This is a dictionary of functions to convert a certain column in your data.这是用于转换数据中特定列的函数字典。 One can use the function ast.literal_eval
from ast
可以使用来自ast
的 function ast.literal_eval
input_data = pd.read_csv(filename, converters= {'integer': ast.literal_eval, 'dictionary': ast.literal_eval, 'list': ast.literal_eval}
Be careful though, this does not work with numpy arrays as you will encounter an error SyntaxError: invalid syntax
because numpy arrays are stored without commas, this is not valid Python Syntax. Be careful though, this does not work with numpy arrays as you will encounter an error SyntaxError: invalid syntax
because numpy arrays are stored without commas, this is not valid Python Syntax. Instead you can define your own function相反,您可以定义自己的 function
def string_to_numpyArray(x):
return np.fromstring(x[1:-1],dtype = float, sep = ' ')
and then use this as follows然后按如下方式使用它
input_data = pd.read_csv(filename, converters= {'integer': ast.literal_eval, 'dictionary': ast.literal_eval, 'list': ast.literal_eval, 'np_list':string_to_numpyArray}
Hope this is helpful for someone.希望这对某人有帮助。
Cheers干杯
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.