[英]How to convert the element in a pandas.dataframe to np.float while use pandas.read_csv to read csv file?
I have a .csv file, which is an exported output from a software. 我有一个.csv文件,它是软件的导出输出。 This .csv file contains a lot of NaNs.
这个.csv文件包含很多NaN。 I need to analyze the data by reading it into a dataframe, and use
dataframe.fillna(0)
to replace all the NaNs with 0. However, when I use pandas.read_csv()
to import this .csv file, the type of element in the dataframe is 'str', so dataframe.fillna(0)
cannot be used. 我需要通过将数据读入数据帧来分析数据,并使用
dataframe.fillna(0)
将所有NaN替换为0.但是,当我使用pandas.read_csv()
导入此.csv文件时,元素的类型在数据帧中是'str',因此不能使用dataframe.fillna(0)
。 So my question is: how to convert the element as np.float while reading the .csv file? 所以我的问题是:如何在读取.csv文件时将元素转换为np.float?
There is an argument dtype
for pandas.read_csv
, here is the explantion: 有一种说法是
dtype
为pandas.read_csv
,这里是explantion:
dtype : Type name or dict of column -> type, default None
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} Use str
or object to preserve and not interpret dtype. If converters are specified,
they will be applied INSTEAD of dtype conversion.
Any examples of how to use it? 有关如何使用它的任何例子?
Thank you very much! 非常感谢你!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~
UPDATE: 更新:
Here are several solutions proposed by the answerers: 以下是答复者提出的几种解决方案:
(1) from @Jakub. (1)来自@Jakub。 Setting the
na_values=NaN
in pandas.read_csv()
, so all of the elements in the .csv file can be converted to np.float
as being read into dataframe. 在
pandas.read_csv()
设置na_values=NaN
,因此.csv文件中的所有元素都可以转换为np.float
作为读入数据帧。
(2) from @André Christoffer Andersen. (2)来自@AndréChristofferAndersen。 After read the .csv file as dataframe.
读取.csv文件作为数据帧后。 use
pandas.to_numeric
to convert a column of the dataframe into np.float
. 使用
pandas.to_numeric
将数据帧的列转换为np.float
。 Use a for loop to convert all columns into numeric. 使用for循环将所有列转换为数字。
(3) from @ThisGuyCantEven. (3)来自@ThisGuyCantEven。 Use
numpy.loadtxt
to read .csv file into a numpy.ndarray
. 使用
numpy.loadtxt
将.csv文件读入numpy.ndarray
。 Use the skiprows
argument to skip the rows with unequal elements. 使用
skiprows
参数跳过具有不相等元素的行。 Then use numpy.nan_to_num()
to convert nan
to zeros. 然后使用
numpy.nan_to_num()
将nan
转换为零。
Hopefully, it will help following readers! 希望它能帮助读者!
If you have a new enough pandas version you can also use pd.to_numeric(...) for this: 如果您有足够新的pandas版本,您还可以使用pd.to_numeric(...) :
df['mycol'] = pd.to_numeric(df['mycol'], errors='coerce')
And, here is a trick to convert the whole dataframe: 而且,这是一个转换整个数据帧的技巧:
for col in df.columns:
df[col] = pd.to_numeric(df[col], errors='coerce')
Why not just use numpy.loadtxt
? 为什么不使用
numpy.loadtxt
? If you want to use pandas because, say, you have mixed data and you want a numeric column as a numpy array, you can always use df['column'].as_matrix()
, or you can convert the whole data frame if you want. 如果你想使用pandas,因为你有混合数据而你想要一个数字列作为numpy数组,你总是可以使用
df['column'].as_matrix()
,或者你可以转换整个数据框,如果你想。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.