[英]How to Loop over Numeric Column in Pandas Dataframe and filter Values?
df:东风:
Org_Name Emp_Name Age Salary
0 Axempl Rick 29 1000
1 Lastik John 34 2000
2 Xenon sidd 47 9000
3 Foxtrix Ammy thirty 2000
4 Hensaui giny 33 ten
5 menuia rony fifty 7000
6 lopex nick 23 Ninety
I want loop over Numeric Column (Age, Salary) to check each value whether it is numeric or not, if string value present in Numeric column filter out the record and create a new data frame without that error.我想遍历数字列(年龄,工资)来检查每个值是否是数字,如果数字列中存在的字符串值过滤掉记录并创建一个没有该错误的新数据框。
Output: Output:
Org_Name Emp_Name Age Salary
0 Axempl Rick 29 1000
1 Lastik John 34 2000
2 Xenon sidd 47 9000
You could extend this answer to filter on multiple columns for numerical data types:您可以扩展此答案以过滤数字数据类型的多个列:
import pandas as pd
from io import StringIO
data = """
Org_Name,Emp_Name,Age,Salary
Axempl,Rick,29,1000
Lastik,John,34,2000
Xenon,sidd,47,9000
Foxtrix,Ammy,thirty,2000
Hensaui,giny,33,ten
menuia,rony,fifty,7000
lopex,nick,23,Ninety
"""
df = pd.read_csv(StringIO(data))
print('Original dataframe\n', df)
df = df[(df.Age.apply(lambda x: x.isnumeric())) &
(df.Salary.apply(lambda x: x.isnumeric()))]
print('Filtered dataframe\n', df)
gives给
Original dataframe
Org_Name Emp_Name Age Salary
0 Axempl Rick 29 1000
1 Lastik John 34 2000
2 Xenon sidd 47 9000
3 Foxtrix Ammy thirty 2000
4 Hensaui giny 33 ten
5 menuia rony fifty 7000
6 lopex nick 23 Ninety
Filtered dataframe
Org_Name Emp_Name Age Salary
0 Axempl Rick 29 1000
1 Lastik John 34 2000
2 Xenon sidd 47 9000
I believe this can be solved using Pandas' "to_numeric" function.我相信这可以使用 Pandas 的“to_numeric”function 来解决。
import pandas as pd
df['Column to Check'] = pd.to_numeric(df['Column to Check'], downcast='integer', errors='coerce')
df.dropna(axis=0, inplace=True)
Where 'Column to Check' is the column name that your are checking for values that cannot be cast as an integer (or any numeric type);其中“要检查的列”是您正在检查的列名,这些值不能转换为 integer(或任何数字类型); in your question I believe you will want to apply this code to 'Age' and 'Salary'.
在您的问题中,我相信您会希望将此代码应用于“年龄”和“工资”。 "to_numeric" will convert any values in those columns to NaN if they could not be cast as your selected type.
如果无法将这些列中的任何值转换为您选择的类型,“to_numeric”会将它们转换为 NaN。 The "dropna" method will remove all rows that have a NaN in any of your columns.
“dropna”方法将删除在您的任何列中具有 NaN 的所有行。
To loop over the columns like you ask, you could do the following:要像您要求的那样遍历列,您可以执行以下操作:
for col in ['Age', 'Salary']:
df[col] = pd.to_numeric(df[col], downcast='integer', errors='coerce')
df.dropna(axis=0, inplace=True)
EDIT: In response to harry's comment.编辑:回应哈利的评论。 If there are preexisting NaNs in the data, something like the following should keep any valid row that had a preexisting NaN in one of the other columns.
如果数据中存在预先存在的 NaN,则类似以下内容应保留在其他列之一中具有预先存在的 NaN 的任何有效行。
for col in ['Age', 'Salary']:
df[col] = pd.to_numeric(df[col], downcast='integer', errors='coerce')
df = df[df[col].notnull()]
You can use a mask to indicate wheter or not there is a string type among the Age
and Salary
columns:您可以使用掩码来指示
Age
和Salary
列中是否存在字符串类型:
mask_str = (df[['Age', 'Salary']]
.applymap(lambda x: str(type(x)))
.sum(axis=1)
.str.contains("str"))
df[~mask_str]
This is assuming that the dataframe already contains the proper types.这是假设 dataframe 已经包含正确的类型。 If not, you can convert them using the following:
如果没有,您可以使用以下方法转换它们:
def convert(val):
try:
return int(val)
except ValueError:
return val
df = (df.assign(Age=lambda f: f.Age.apply(convert),
Salary=lambda f: f.Salary.apply(convert)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.