简体   繁体   English

如何使用pandas.read_csv读取csv文件时将pandas.dataframe中的元素转换为np.float?

[英]How to convert the element in a pandas.dataframe to np.float while use pandas.read_csv to read csv file?

I have a .csv file, which is an exported output from a software. 我有一个.csv文件,它是软件的导出输出。 This .csv file contains a lot of NaNs. 这个.csv文件包含很多NaN。 I need to analyze the data by reading it into a dataframe, and use dataframe.fillna(0) to replace all the NaNs with 0. However, when I use pandas.read_csv() to import this .csv file, the type of element in the dataframe is 'str', so dataframe.fillna(0) cannot be used. 我需要通过将数据读入数据帧来分析数据,并使用dataframe.fillna(0)将所有NaN替换为0.但是,当我使用pandas.read_csv()导入此.csv文件时,元素的类型在数据帧中是'str',因此不能使用dataframe.fillna(0) So my question is: how to convert the element as np.float while reading the .csv file? 所以我的问题是:如何在读取.csv文件时将元素转换为np.float?

There is an argument dtype for pandas.read_csv , here is the explantion: 有一种说法是dtypepandas.read_csv ,这里是explantion:

dtype : Type name or dict of column -> type, default None
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} Use str 
or object to preserve and not interpret dtype. If converters are specified, 
they will be applied INSTEAD of dtype conversion.

Any examples of how to use it? 有关如何使用它的任何例子?

Thank you very much! 非常感谢你!

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~

UPDATE: 更新:

Here are several solutions proposed by the answerers: 以下是答复者提出的几种解决方案:

(1) from @Jakub. (1)来自@Jakub。 Setting the na_values=NaN in pandas.read_csv() , so all of the elements in the .csv file can be converted to np.float as being read into dataframe. pandas.read_csv()设置na_values=NaN ,因此.csv文件中的所有元素都可以转换为np.float作为读入数据帧。

(2) from @André Christoffer Andersen. (2)来自@AndréChristofferAndersen。 After read the .csv file as dataframe. 读取.csv文件作为数据帧后。 use pandas.to_numeric to convert a column of the dataframe into np.float . 使用pandas.to_numeric将数据帧的列转换为np.float Use a for loop to convert all columns into numeric. 使用for循环将所有列转换为数字。

(3) from @ThisGuyCantEven. (3)来自@ThisGuyCantEven。 Use numpy.loadtxt to read .csv file into a numpy.ndarray . 使用numpy.loadtxt将.csv文件读入numpy.ndarray Use the skiprows argument to skip the rows with unequal elements. 使用skiprows参数跳过具有不相等元素的行。 Then use numpy.nan_to_num() to convert nan to zeros. 然后使用numpy.nan_to_num()nan转换为零。

Hopefully, it will help following readers! 希望它能帮助读者!

If you have a new enough pandas version you can also use pd.to_numeric(...) for this: 如果您有足够新的pandas版本,您还可以使用pd.to_numeric(...)

df['mycol'] = pd.to_numeric(df['mycol'], errors='coerce')

And, here is a trick to convert the whole dataframe: 而且,这是一个转换整个数据帧的技巧:

for col in df.columns:
    df[col] = pd.to_numeric(df[col], errors='coerce')

Why not just use numpy.loadtxt ? 为什么不使用numpy.loadtxt If you want to use pandas because, say, you have mixed data and you want a numeric column as a numpy array, you can always use df['column'].as_matrix() , or you can convert the whole data frame if you want. 如果你想使用pandas,因为你有混合数据而你想要一个数字列作为numpy数组,你总是可以使用df['column'].as_matrix() ,或者你可以转换整个数据框,如果你想。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM