简体   繁体   English

如何根据数据框中的列条件应用函数

[英]How to apply function on the basis of column condition in a dataframe

I am trying to apply a function over a column in a dataframe if one of the column ie df['mask'] contain False it should skip that row.我正在尝试在数据框中的一列上应用一个函数,如果其中一列即 df['mask'] 包含 False 它应该跳过该行。 mask column is bool type mask 列是 bool 类型

this is mine function这是我的功能

     def dates(inp):
        temp = inp
        parser = CommonRegex()
        inp = inp.apply(parser.dates).str.join(', ')
        return np.where(inp.apply(parser.dates).str.len() == 0, temp, 'X' * random.randrange(3, 8)) 

here what i have applied这是我应用的

      df1.assign(**df1['Dates'].apply(dates).where(df1['mask']== TRUE))

Its throwing error它的投掷错误

         32     temp = inp
         33     parser = CommonRegex()
    ---> 34     inp = inp.apply(parser.dates).str.join(', ')
         35     return np.where(inp.apply(parser.dates).str.len() == 0, temp, 'X' * random.randrange(3, 8))
         36 

    AttributeError: 'Timestamp' object has no attribute 'apply'    

Here is mine dataframe look like这是我的数据框看起来像

         Name     |  Dates   |  mask |
         ..............................
         Tom      | 21/02/2018| True
         Nick     | 28/07/2018| False
         Juli     | 11/08/2018| True
         June     | 01/02/2018| True
         XHGM     | 07/08/2018| False   

I am trying to get output in this way that for false value it should skip and and for true value it should call date function and hide the data values我正在尝试以这种方式获取输出,对于假值,它应该跳过,对于真值,它应该调用日期函数并隐藏数据值

         Name     |  Dates   |  mask |
         ..............................
         Tom      | XXXXX     | True
         Nick     |28/07/2018 | False
         Juli     | XXXXX     | True
         June     | XXXXX     | True
         XHGM     | 07/08/2018| False     

Use Series.pipe for pass columns to function and also filter rows with boolean indexing by mask and DataFrame.loc for specify column name:使用Series.pipe传递列的功能,并通过掩码和DataFrame.locboolean indexing过滤行以指定列名:

df1.loc[df1['mask'], 'Dates'] = df1.loc[df1['mask'], 'Dates'].pipe(dates)
print (df1)
   Name       Dates   mask
0   Tom         XXX   True
1  Nick  28/07/2018  False
2  Juli         XXX   True
3  June         XXX   True
4  XHGM  07/08/2018  False

Solution with assign is possible too, but disadvantage is function loop by all values and then filtering, so if only few True s values in large Dataframe should be slowier: assign的解决方案也是可能的,但缺点是所有值的函数循环然后过滤,因此如果大型Dataframe中只有少数True s 值应该更慢:

df1 = df1.assign(Dates = np.where(df1['mask'], df1['Dates'].pipe(dates), df1['Dates']))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM