![](/img/trans.png)
[英]Alternative to apply function for applying a function to each row in Pandas DataFrame
[英]pandas: applying function over each row of Dataframe
我有一個熊貓DataFrame,其中包含3列:
| val1 | val2 | val3 |
|--------------------------|
| Nike | NaN | NaN |
| Men | Adidas | NaN |
| Puma | Red | Women |
和3個清單:
Brands = ['Adidas', 'Nike', 'Puma']
Gender = ['Men', 'Women']
Color=['Red', 'Blue', 'Green']
我嘗試將函數應用於每行,以根據函數返回的布爾值檢查每個值並將其放在新列中。
| val1 | val2 | val3 | brand | gender | color
|----------------------------------------------------
| Nike | NaN | NaN | Nike | NaN | NaN
| Men | Adidas | NaN | Adidas | Men | NaN
| Puma | Red | Women | Puma | Women | Red
我使用列表來說明我的問題,但是在腳本中,我使用附魔庫來檢查字典中是否存在值。
這是我已經嘗試過的:
ref_brands = enchant.request_pwl_dict("ref_brands.txt")
brands_checker = SpellChecker(ref_brands)
print brands_checker.check('Puma')
> True
print brands_checker.check('Men')
> False
[pyenchant tutorial][1]
def my_cust_check(x, checker):
l = x.tolist()
for e in iter(l):
try:
if checker.check(e.strip().encode('utf-8')) is True:
return e.strip()
else:
return None
except:
return None
df_query_split['brand'] = df_query_split.apply(my_cust_check,checker=brand_checker, axis=1)
df_query_split['gender'] = df_query_split.apply(my_cust_check,checker=gender_checker, axis=1)
df_query_split['color'] = df_query_split.apply(my_cust_check,checker=color_checker, axis=1)
您可以使用:
df['brand'] = df[df.isin(Brands)].ffill(axis=1).iloc[:, -1]
df['gender'] = df[df.isin(Gender)].ffill(axis=1).iloc[:, -1]
df['color'] = df[df.isin(Color)].ffill(axis=1).iloc[:, -1]
print (df)
val1 val2 val3 brand gender color
0 Nike NaN NaN Nike NaN NaN
1 Men Adidas NaN Adidas Men NaN
2 Puma Red Women Puma Women Red
詳情:
首先通過DataFrame.isin
比較:
print (df.isin(Brands))
val1 val2 val3
0 True False False
1 False True False
2 True False False
提取True
的值:
print (df[df.isin(Brands)])
val1 val2 val3
0 Nike NaN NaN
1 NaN Adidas NaN
2 Puma NaN NaN
用fillna
替換NaN
fillna
使用正向填充( ffill
):
print (df[df.isin(Brands)].ffill(axis=1))
val1 val2 val3
0 Nike Nike Nike
1 NaN Adidas Adidas
2 Puma Puma Puma
通過iloc
查找最后一列:
print (df[df.isin(Brands)].ffill(1).iloc[:, -1])
0 Nike
1 Adidas
2 Puma
Name: val3, dtype: object
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.