如何根据数据框中的列条件应用函数

Question

I am trying to apply a function over a column in a dataframe if one of the column ie df['mask'] contain False it should skip that row.我正在尝试在数据框中的一列上应用一个函数，如果其中一列即 df['mask'] 包含 False 它应该跳过该行。 mask column is bool type mask 列是 bool 类型

this is mine function这是我的功能

     def dates(inp):
        temp = inp
        parser = CommonRegex()
        inp = inp.apply(parser.dates).str.join(', ')
        return np.where(inp.apply(parser.dates).str.len() == 0, temp, 'X' * random.randrange(3, 8))

here what i have applied这是我应用的

      df1.assign(**df1['Dates'].apply(dates).where(df1['mask']== TRUE))

Its throwing error它的投掷错误

         32     temp = inp
         33     parser = CommonRegex()
    ---> 34     inp = inp.apply(parser.dates).str.join(', ')
         35     return np.where(inp.apply(parser.dates).str.len() == 0, temp, 'X' * random.randrange(3, 8))
         36 

    AttributeError: 'Timestamp' object has no attribute 'apply'

Here is mine dataframe look like这是我的数据框看起来像

         Name     |  Dates   |  mask |
         ..............................
         Tom      | 21/02/2018| True
         Nick     | 28/07/2018| False
         Juli     | 11/08/2018| True
         June     | 01/02/2018| True
         XHGM     | 07/08/2018| False

I am trying to get output in this way that for false value it should skip and and for true value it should call date function and hide the data values我正在尝试以这种方式获取输出，对于假值，它应该跳过，对于真值，它应该调用日期函数并隐藏数据值

         Name     |  Dates   |  mask |
         ..............................
         Tom      | XXXXX     | True
         Nick     |28/07/2018 | False
         Juli     | XXXXX     | True
         June     | XXXXX     | True
         XHGM     | 07/08/2018| False

Answer 1

Use Series.pipe for pass columns to function and also filter rows with boolean indexing by mask and DataFrame.loc for specify column name:使用Series.pipe传递列的功能，并通过掩码和DataFrame.loc的boolean indexing过滤行以指定列名：

df1.loc[df1['mask'], 'Dates'] = df1.loc[df1['mask'], 'Dates'].pipe(dates)
print (df1)
   Name       Dates   mask
0   Tom         XXX   True
1  Nick  28/07/2018  False
2  Juli         XXX   True
3  June         XXX   True
4  XHGM  07/08/2018  False

Solution with assign is possible too, but disadvantage is function loop by all values and then filtering, so if only few True s values in large Dataframe should be slowier: assign的解决方案也是可能的，但缺点是所有值的函数循环然后过滤，因此如果大型Dataframe中只有少数True s 值应该更慢：

df1 = df1.assign(Dates = np.where(df1['mask'], df1['Dates'].pipe(dates), df1['Dates']))

如何根据数据框中的列条件应用函数

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-11-29 12:32:35

如何根据数据框中的列条件应用函数

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-11-29 12:32:35

解决方案1
1 已采纳 2019-11-29 12:32:35