I have a pandas DataFrame which contains 3 columns:
| val1 | val2 | val3 |
|--------------------------|
| Nike | NaN | NaN |
| Men | Adidas | NaN |
| Puma | Red | Women |
and 3 lists:
Brands = ['Adidas', 'Nike', 'Puma']
Gender = ['Men', 'Women']
Color=['Red', 'Blue', 'Green']
I trying to apply a function to each row to check and put each value in a new column depending on boolean value returned by the function.
| val1 | val2 | val3 | brand | gender | color
|----------------------------------------------------
| Nike | NaN | NaN | Nike | NaN | NaN
| Men | Adidas | NaN | Adidas | Men | NaN
| Puma | Red | Women | Puma | Women | Red
I'm using lists to illustrate my issue but in my script, I'm using enchant library to check the existence of a value in my dictionary.
Here's what I already tried:
ref_brands = enchant.request_pwl_dict("ref_brands.txt")
brands_checker = SpellChecker(ref_brands)
print brands_checker.check('Puma')
> True
print brands_checker.check('Men')
> False
[pyenchant tutorial][1]
def my_cust_check(x, checker):
l = x.tolist()
for e in iter(l):
try:
if checker.check(e.strip().encode('utf-8')) is True:
return e.strip()
else:
return None
except:
return None
df_query_split['brand'] = df_query_split.apply(my_cust_check,checker=brand_checker, axis=1)
df_query_split['gender'] = df_query_split.apply(my_cust_check,checker=gender_checker, axis=1)
df_query_split['color'] = df_query_split.apply(my_cust_check,checker=color_checker, axis=1)
You can use:
df['brand'] = df[df.isin(Brands)].ffill(axis=1).iloc[:, -1]
df['gender'] = df[df.isin(Gender)].ffill(axis=1).iloc[:, -1]
df['color'] = df[df.isin(Color)].ffill(axis=1).iloc[:, -1]
print (df)
val1 val2 val3 brand gender color
0 Nike NaN NaN Nike NaN NaN
1 Men Adidas NaN Adidas Men NaN
2 Puma Red Women Puma Women Red
Detail:
First compare by DataFrame.isin
:
print (df.isin(Brands))
val1 val2 val3
0 True False False
1 False True False
2 True False False
Extract values of True
s:
print (df[df.isin(Brands)])
val1 val2 val3
0 Nike NaN NaN
1 NaN Adidas NaN
2 Puma NaN NaN
Replace NaN
s by fillna
with forward filling ( ffill
):
print (df[df.isin(Brands)].ffill(axis=1))
val1 val2 val3
0 Nike Nike Nike
1 NaN Adidas Adidas
2 Puma Puma Puma
Seelct last column by iloc
:
print (df[df.isin(Brands)].ffill(1).iloc[:, -1])
0 Nike
1 Adidas
2 Puma
Name: val3, dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.