[英]pandas: applying function over each row of Dataframe
I have a pandas DataFrame which contains 3 columns: 我有一个熊猫DataFrame,其中包含3列:
| val1 | val2 | val3 |
|--------------------------|
| Nike | NaN | NaN |
| Men | Adidas | NaN |
| Puma | Red | Women |
and 3 lists: 和3个清单:
Brands = ['Adidas', 'Nike', 'Puma']
Gender = ['Men', 'Women']
Color=['Red', 'Blue', 'Green']
I trying to apply a function to each row to check and put each value in a new column depending on boolean value returned by the function. 我尝试将函数应用于每行,以根据函数返回的布尔值检查每个值并将其放在新列中。
| val1 | val2 | val3 | brand | gender | color
|----------------------------------------------------
| Nike | NaN | NaN | Nike | NaN | NaN
| Men | Adidas | NaN | Adidas | Men | NaN
| Puma | Red | Women | Puma | Women | Red
I'm using lists to illustrate my issue but in my script, I'm using enchant library to check the existence of a value in my dictionary. 我使用列表来说明我的问题,但是在脚本中,我使用附魔库来检查字典中是否存在值。
Here's what I already tried: 这是我已经尝试过的:
ref_brands = enchant.request_pwl_dict("ref_brands.txt")
brands_checker = SpellChecker(ref_brands)
print brands_checker.check('Puma')
> True
print brands_checker.check('Men')
> False
[pyenchant tutorial][1]
def my_cust_check(x, checker):
l = x.tolist()
for e in iter(l):
try:
if checker.check(e.strip().encode('utf-8')) is True:
return e.strip()
else:
return None
except:
return None
df_query_split['brand'] = df_query_split.apply(my_cust_check,checker=brand_checker, axis=1)
df_query_split['gender'] = df_query_split.apply(my_cust_check,checker=gender_checker, axis=1)
df_query_split['color'] = df_query_split.apply(my_cust_check,checker=color_checker, axis=1)
You can use: 您可以使用:
df['brand'] = df[df.isin(Brands)].ffill(axis=1).iloc[:, -1]
df['gender'] = df[df.isin(Gender)].ffill(axis=1).iloc[:, -1]
df['color'] = df[df.isin(Color)].ffill(axis=1).iloc[:, -1]
print (df)
val1 val2 val3 brand gender color
0 Nike NaN NaN Nike NaN NaN
1 Men Adidas NaN Adidas Men NaN
2 Puma Red Women Puma Women Red
Detail: 详情:
First compare by DataFrame.isin
: 首先通过DataFrame.isin
比较:
print (df.isin(Brands))
val1 val2 val3
0 True False False
1 False True False
2 True False False
Extract values of True
s: 提取True
的值:
print (df[df.isin(Brands)])
val1 val2 val3
0 Nike NaN NaN
1 NaN Adidas NaN
2 Puma NaN NaN
Replace NaN
s by fillna
with forward filling ( ffill
): 用fillna
替换NaN
fillna
使用正向填充( ffill
):
print (df[df.isin(Brands)].ffill(axis=1))
val1 val2 val3
0 Nike Nike Nike
1 NaN Adidas Adidas
2 Puma Puma Puma
Seelct last column by iloc
: 通过iloc
查找最后一列:
print (df[df.isin(Brands)].ffill(1).iloc[:, -1])
0 Nike
1 Adidas
2 Puma
Name: val3, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.