繁体   English   中英

熊猫:在Dataframe的每一行上应用函数

[英]pandas: applying function over each row of Dataframe

我有一个熊猫DataFrame,其中包含3列:

|  val1  |  val2  |  val3  | 
|--------------------------|
|  Nike  |  NaN   |  NaN   |  
|  Men   | Adidas |  NaN   |  
| Puma   |  Red   |  Women | 

和3个清单:

Brands = ['Adidas', 'Nike', 'Puma']
Gender = ['Men', 'Women']
Color=['Red', 'Blue', 'Green']

我尝试将函数应用于每行,以根据函数返回的布尔值检查每个值并将其放在新列中。

|  val1  |  val2  |  val3  | brand | gender | color
|----------------------------------------------------
|  Nike  |  NaN   |  NaN   |  Nike  |  NaN   | NaN
|  Men   | Adidas |  NaN   | Adidas |  Men   | NaN
|  Puma  |  Red   |  Women | Puma   |  Women | Red   

我使用列表来说明我的问题,但是在脚本中,我使用附魔库来检查字典中是否存在值。

这是我已经尝试过的:

ref_brands = enchant.request_pwl_dict("ref_brands.txt")
brands_checker = SpellChecker(ref_brands)

print brands_checker.check('Puma')
> True
print brands_checker.check('Men')
> False

[pyenchant tutorial][1]

def my_cust_check(x, checker):
    l = x.tolist()
    for e in iter(l):
        try:
             if checker.check(e.strip().encode('utf-8')) is True:
                return e.strip()
             else:
                return None
        except:
             return None

df_query_split['brand'] = df_query_split.apply(my_cust_check,checker=brand_checker, axis=1)
df_query_split['gender'] = df_query_split.apply(my_cust_check,checker=gender_checker, axis=1)
df_query_split['color'] = df_query_split.apply(my_cust_check,checker=color_checker, axis=1)

您可以使用:

df['brand'] = df[df.isin(Brands)].ffill(axis=1).iloc[:, -1]
df['gender'] = df[df.isin(Gender)].ffill(axis=1).iloc[:, -1]
df['color'] = df[df.isin(Color)].ffill(axis=1).iloc[:, -1]
print (df)
   val1    val2   val3   brand gender color
0  Nike     NaN    NaN    Nike    NaN   NaN
1   Men  Adidas    NaN  Adidas    Men   NaN
2  Puma     Red  Women    Puma  Women   Red

详情:

首先通过DataFrame.isin比较:

print (df.isin(Brands))
    val1   val2   val3
0   True  False  False
1  False   True  False
2   True  False  False

提取True的值:

print (df[df.isin(Brands)])
   val1    val2 val3
0  Nike     NaN  NaN
1   NaN  Adidas  NaN
2  Puma     NaN  NaN

fillna替换NaN fillna使用正向填充( ffill ):

print (df[df.isin(Brands)].ffill(axis=1))
   val1    val2    val3
0  Nike    Nike    Nike
1   NaN  Adidas  Adidas
2  Puma    Puma    Puma

通过iloc查找最后一列:

print (df[df.isin(Brands)].ffill(1).iloc[:, -1])
0      Nike
1    Adidas
2      Puma
Name: val3, dtype: object

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM