简体   繁体   中英

pandas: applying function over each row of Dataframe

I have a pandas DataFrame which contains 3 columns:

|  val1  |  val2  |  val3  | 
|--------------------------|
|  Nike  |  NaN   |  NaN   |  
|  Men   | Adidas |  NaN   |  
| Puma   |  Red   |  Women | 

and 3 lists:

Brands = ['Adidas', 'Nike', 'Puma']
Gender = ['Men', 'Women']
Color=['Red', 'Blue', 'Green']

I trying to apply a function to each row to check and put each value in a new column depending on boolean value returned by the function.

|  val1  |  val2  |  val3  | brand | gender | color
|----------------------------------------------------
|  Nike  |  NaN   |  NaN   |  Nike  |  NaN   | NaN
|  Men   | Adidas |  NaN   | Adidas |  Men   | NaN
|  Puma  |  Red   |  Women | Puma   |  Women | Red   

I'm using lists to illustrate my issue but in my script, I'm using enchant library to check the existence of a value in my dictionary.

Here's what I already tried:

ref_brands = enchant.request_pwl_dict("ref_brands.txt")
brands_checker = SpellChecker(ref_brands)

print brands_checker.check('Puma')
> True
print brands_checker.check('Men')
> False

[pyenchant tutorial][1]

def my_cust_check(x, checker):
    l = x.tolist()
    for e in iter(l):
        try:
             if checker.check(e.strip().encode('utf-8')) is True:
                return e.strip()
             else:
                return None
        except:
             return None

df_query_split['brand'] = df_query_split.apply(my_cust_check,checker=brand_checker, axis=1)
df_query_split['gender'] = df_query_split.apply(my_cust_check,checker=gender_checker, axis=1)
df_query_split['color'] = df_query_split.apply(my_cust_check,checker=color_checker, axis=1)

You can use:

df['brand'] = df[df.isin(Brands)].ffill(axis=1).iloc[:, -1]
df['gender'] = df[df.isin(Gender)].ffill(axis=1).iloc[:, -1]
df['color'] = df[df.isin(Color)].ffill(axis=1).iloc[:, -1]
print (df)
   val1    val2   val3   brand gender color
0  Nike     NaN    NaN    Nike    NaN   NaN
1   Men  Adidas    NaN  Adidas    Men   NaN
2  Puma     Red  Women    Puma  Women   Red

Detail:

First compare by DataFrame.isin :

print (df.isin(Brands))
    val1   val2   val3
0   True  False  False
1  False   True  False
2   True  False  False

Extract values of True s:

print (df[df.isin(Brands)])
   val1    val2 val3
0  Nike     NaN  NaN
1   NaN  Adidas  NaN
2  Puma     NaN  NaN

Replace NaN s by fillna with forward filling ( ffill ):

print (df[df.isin(Brands)].ffill(axis=1))
   val1    val2    val3
0  Nike    Nike    Nike
1   NaN  Adidas  Adidas
2  Puma    Puma    Puma

Seelct last column by iloc :

print (df[df.isin(Brands)].ffill(1).iloc[:, -1])
0      Nike
1    Adidas
2      Puma
Name: val3, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM