[英]data.dropna() doesnt work for my data.csv file and i still get a data with NaN elements
I'm studying Pandas from Python.我正在研究 Python 的 Pandas。
I'm trying to remove NaN elements from my data.csv file with data.dropna() and it isn't removing.我正在尝试使用 data.dropna() 从我的 data.csv 文件中删除 NaN 元素,但它没有删除。
import pandas as pd
data = pd.read_csv('data.csv')
new_data = data.dropna()
print(new_data)
This is data.csv content.这是data.csv内容。
Duration Date Pulse Maxpulse Calories
60 '2020/12/01' 110 130 409.1
60 '2020/12/02' 117 145 479.0
60 '2020/12/03' 103 135 340.0
45 '2020/12/04' 109 175 282.4
45 '2020/12/05' 117 148 406.0
60 '2020/12/06' 102 127 300.0
60 '2020/12/07' 110 136 374.0
450 '2020/12/08' 104 134 253.3
30 '2020/12/09' 109 133 195.1
60 '2020/12/10' 98 124 269.0
60 '2020/12/11' 103 147 329.3
60 '2020/12/12' 100 120 250.7
60 '2020/12/12' 100 120 250.7
60 '2020/12/13' 106 128 345.3
60 '2020/12/14' 104 132 379.3
60 '2020/12/15' 98 123 275.0
60 '2020/12/16' 98 120 215.2
60 '2020/12/17' 100 120 300.0
45 '2020/12/18' 90 112 NaN
60 '2020/12/19' 103 123 323.0
45 '2020/12/20' 97 125 243.0
60 '2020/12/21' 108 131 364.2
45 NaN 100 119 282.0
60 '2020/12/23' 130 101 300.0
45 '2020/12/24' 105 132 246.0
60 '2020/12/25' 102 126 334.5
60 2020/12/26 100 120 250.0
60 '2020/12/27' 92 118 241.0
60 '2020/12/28' 103 132 NaN
60 '2020/12/29' 100 132 280.0
60 '2020/12/30' 102 129 380.3
60 '2020/12/31' 92 115 243.0
My guess is that data.csv is written incorrect?
我的猜测是data.csv 写错了?
The data.csv file is written wrong, to fix it need to add commas. data.csv 文件写错了,要修正它需要加逗号。
Corrected format: data.csv更正格式: data.csv
Duration,Date,Pulse,Maxpulse,Calories
60,2020/12/01',110,130,409.1
60,2020/12/02',117,145,479.0
60,2020/12/03',103,135,340.0
45,2020/12/04',109,175,282.4
45,2020/12/05',117,148,406.0
60,2020/12/06',102,127,300.0
60,2020/12/07',110,136,374.0
450,2020/12/08',104,134,253.3
30,2020/12/09',109,133,195.1
60,2020/12/10',98,124,269.0
60,2020/12/11',103,147,329.3
60,2020/12/12',100,120,250.7
60,2020/12/12',100,120,250.7
60,2020/12/13',106,128,345.3
60,2020/12/14',104,132,379.3
60,2020/12/15',98,123,275.0
60,2020/12/16',98,120,215.2
60,2020/12/17',100,120,300.0
45,2020/12/18',90,112,
60,2020/12/19',103,123,323.0
45,2020/12/20',97,125,243.0
60,2020/12/21',108,131,364.2
45,,100,119,282.0
60,2020/12/23',130,101,300.0
45,2020/12/24',105,132,246.0
60,2020/12/25',102,126,334.5
60,20201226,100,120,250.0
60,2020/12/27',92,118,241.0
60,2020/12/28',103,132,
60,2020/12/29',100,132,280.0
60,2020/12/30',102,129,380.3
60,2020/12/31',92,115,243.0
TL,DR: Try this: TL,DR:试试这个:
new_data = df.fillna(pd.NA).dropna()
new_data = df.fillna(pd.NA).dropna()
or或者
import numpy as np new_data = df.fillna(np.NaN).dropna()
导入 numpy 作为 np new_data = df.fillna(np.NaN).dropna()
That's the real csv file?那是真正的 csv 文件吗? I don't think so.
我不这么认为。
There isn't any specification of missing values in csv doc [1]. csv doc [1] 中没有任何缺失值的规范。 From my experience, missing values in csv are represented by nothing between two separators (if the separator is a comma, it looks like,,).
根据我的经验,csv 中的缺失值在两个分隔符之间没有任何表示(如果分隔符是逗号,它看起来像,,)。
From pandas doc[2], the pandas.read_csv contains an argument "na_values":从 pandas doc[2],pandas.read_csv 包含一个参数“na_values”:
na_values: scalar, str, list-like, or dict, optional
na_values:标量、str、类似列表或 dict,可选
Additional strings to recognize as NA/NaN.
要识别为 NA/NaN 的附加字符串。 If dict passed, specific per-column NA values.
如果 dict 通过,特定的每列 NA 值。 By default the following values are interpreted as NaN: '', '#N/A', '#N/AN/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', '', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'.
默认情况下,以下值被解释为 NaN:''、'#N/A'、'#N/AN/A'、'#NA'、'-1.#IND'、'-1.#QNAN'、 '-NaN'、'-nan'、'1.#IND'、'1.#QNAN'、''、'N/A'、'NA'、'NULL'、'NaN'、'n/a' ,“南”,“空”。
If your csv file contains 'NaN', pandas are capable to infer and read as NaN, but you can pass the parameter as you need.如果您的 csv 文件包含“NaN”,则 pandas 能够推断和读取为 NaN,但您可以根据需要传递参数。
Also, you can use (consider i as the number of row and j for column):此外,您可以使用(将 i 视为行数,将 j 视为列数):
type(df.iloc[i,j])
类型(df.iloc[i,j])
Compare with:与之比较:
type(np.NaN) # numpy NaN
类型(np.NaN)#numpy NaN
float
漂浮
type(pd.NA) # pandas NaN
类型(pd.NA)#pandas NaN
pandas._libs.missing.NAType
pandas._libs.missing.NAType
[1] https://datatracker.ietf.org/doc/html/rfc4180 [1] https://datatracker.ietf.org/doc/html/rfc4180
[2] https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html [2] https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.