![](/img/trans.png)
[英]Alternative to apply function for applying a function to each row in Pandas DataFrame
[英]pandas: applying function over each row of Dataframe
我有一个熊猫DataFrame,其中包含3列:
| val1 | val2 | val3 |
|--------------------------|
| Nike | NaN | NaN |
| Men | Adidas | NaN |
| Puma | Red | Women |
和3个清单:
Brands = ['Adidas', 'Nike', 'Puma']
Gender = ['Men', 'Women']
Color=['Red', 'Blue', 'Green']
我尝试将函数应用于每行,以根据函数返回的布尔值检查每个值并将其放在新列中。
| val1 | val2 | val3 | brand | gender | color
|----------------------------------------------------
| Nike | NaN | NaN | Nike | NaN | NaN
| Men | Adidas | NaN | Adidas | Men | NaN
| Puma | Red | Women | Puma | Women | Red
我使用列表来说明我的问题,但是在脚本中,我使用附魔库来检查字典中是否存在值。
这是我已经尝试过的:
ref_brands = enchant.request_pwl_dict("ref_brands.txt")
brands_checker = SpellChecker(ref_brands)
print brands_checker.check('Puma')
> True
print brands_checker.check('Men')
> False
[pyenchant tutorial][1]
def my_cust_check(x, checker):
l = x.tolist()
for e in iter(l):
try:
if checker.check(e.strip().encode('utf-8')) is True:
return e.strip()
else:
return None
except:
return None
df_query_split['brand'] = df_query_split.apply(my_cust_check,checker=brand_checker, axis=1)
df_query_split['gender'] = df_query_split.apply(my_cust_check,checker=gender_checker, axis=1)
df_query_split['color'] = df_query_split.apply(my_cust_check,checker=color_checker, axis=1)
您可以使用:
df['brand'] = df[df.isin(Brands)].ffill(axis=1).iloc[:, -1]
df['gender'] = df[df.isin(Gender)].ffill(axis=1).iloc[:, -1]
df['color'] = df[df.isin(Color)].ffill(axis=1).iloc[:, -1]
print (df)
val1 val2 val3 brand gender color
0 Nike NaN NaN Nike NaN NaN
1 Men Adidas NaN Adidas Men NaN
2 Puma Red Women Puma Women Red
详情:
首先通过DataFrame.isin
比较:
print (df.isin(Brands))
val1 val2 val3
0 True False False
1 False True False
2 True False False
提取True
的值:
print (df[df.isin(Brands)])
val1 val2 val3
0 Nike NaN NaN
1 NaN Adidas NaN
2 Puma NaN NaN
用fillna
替换NaN
fillna
使用正向填充( ffill
):
print (df[df.isin(Brands)].ffill(axis=1))
val1 val2 val3
0 Nike Nike Nike
1 NaN Adidas Adidas
2 Puma Puma Puma
通过iloc
查找最后一列:
print (df[df.isin(Brands)].ffill(1).iloc[:, -1])
0 Nike
1 Adidas
2 Puma
Name: val3, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.