[英]Drop rows with a 'question mark' value in any column in a pandas dataframe
I want to remove all rows (or take all rows without) a question mark symbol in any column. 我想在任何列中删除所有行(或不带所有行)的问号符号。 I also want to change the elements to float type. 我还想将元素更改为float类型。
Input: 输入:
X Y Z
0 1 ?
1 2 3
? ? 4
4 4 4
? 2 5
Output: 输出:
X Y Z
1 2 3
4 4 4
Preferably using pandas dataframe operations. 最好使用pandas数据帧操作。
You can try first find string ?
你可以尝试先找到字符串?
in columns, create boolean mask and last filter rows - use boolean indexing . 在列中,创建布尔掩码和最后的过滤行 - 使用布尔索引 。 If you need convert columns to float
, use astype
: 如果需要将列转换为float
,请使用astype
:
print ~((df['X'] == '?' ) (df['Y'] == '?' ) | (df['Z'] == '?' ))
0 False
1 True
2 False
3 True
4 False
dtype: bool
df1 = df[~((df['X'] == '?' ) | (df['Y'] == '?' ) | (df['Z'] == '?' ))].astype(float)
print df1
X Y Z
1 1 2 3
3 4 4 4
print df1.dtypes
X float64
Y float64
Z float64
dtype: object
Or you can try: 或者您可以尝试:
df['X'] = pd.to_numeric(df['X'], errors='coerce')
df['Y'] = pd.to_numeric(df['Y'], errors='coerce')
df['Z'] = pd.to_numeric(df['Z'], errors='coerce')
print df
X Y Z
0 0 1 NaN
1 1 2 3
2 NaN NaN 4
3 4 4 4
4 NaN 2 5
print ((df['X'].notnull() ) & (df['Y'].notnull() ) & (df['Z'].notnull() ))
0 False
1 True
2 False
3 True
4 False
dtype: bool
print df[ ((df['X'].notnull() ) & (df['Y'].notnull() ) & (df['Z'].notnull() )) ].astype(float)
X Y Z
1 1 2 3
3 4 4 4
Better is use: 更好用的是:
df = df[(df != '?').all(axis=1)]
Or: 要么:
df = df[~(df == '?').any(axis=1)]
You can try replacing ?
你可以尝试更换?
with null values 具有空值
import numpy as np
data = df.replace("?", "np.Nan")
if you want to replace particular column try this: 如果要替换特定列,请尝试以下操作:
data = df["column name"].replace("?", "np.Nan")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.