[英]How to apply function on the basis of column condition in a dataframe
I am trying to apply a function over a column in a dataframe if one of the column ie df['mask'] contain False it should skip that row.我正在尝试在数据框中的一列上应用一个函数,如果其中一列即 df['mask'] 包含 False 它应该跳过该行。 mask column is bool type mask 列是 bool 类型
this is mine function这是我的功能
def dates(inp):
temp = inp
parser = CommonRegex()
inp = inp.apply(parser.dates).str.join(', ')
return np.where(inp.apply(parser.dates).str.len() == 0, temp, 'X' * random.randrange(3, 8))
here what i have applied这是我应用的
df1.assign(**df1['Dates'].apply(dates).where(df1['mask']== TRUE))
Its throwing error它的投掷错误
32 temp = inp
33 parser = CommonRegex()
---> 34 inp = inp.apply(parser.dates).str.join(', ')
35 return np.where(inp.apply(parser.dates).str.len() == 0, temp, 'X' * random.randrange(3, 8))
36
AttributeError: 'Timestamp' object has no attribute 'apply'
Here is mine dataframe look like这是我的数据框看起来像
Name | Dates | mask |
..............................
Tom | 21/02/2018| True
Nick | 28/07/2018| False
Juli | 11/08/2018| True
June | 01/02/2018| True
XHGM | 07/08/2018| False
I am trying to get output in this way that for false value it should skip and and for true value it should call date function and hide the data values我正在尝试以这种方式获取输出,对于假值,它应该跳过,对于真值,它应该调用日期函数并隐藏数据值
Name | Dates | mask |
..............................
Tom | XXXXX | True
Nick |28/07/2018 | False
Juli | XXXXX | True
June | XXXXX | True
XHGM | 07/08/2018| False
Use Series.pipe
for pass columns to function and also filter rows with boolean indexing
by mask and DataFrame.loc
for specify column name:使用Series.pipe
传递列的功能,并通过掩码和DataFrame.loc
的boolean indexing
过滤行以指定列名:
df1.loc[df1['mask'], 'Dates'] = df1.loc[df1['mask'], 'Dates'].pipe(dates)
print (df1)
Name Dates mask
0 Tom XXX True
1 Nick 28/07/2018 False
2 Juli XXX True
3 June XXX True
4 XHGM 07/08/2018 False
Solution with assign
is possible too, but disadvantage is function loop by all values and then filtering, so if only few True
s values in large Dataframe
should be slowier: assign
的解决方案也是可能的,但缺点是所有值的函数循环然后过滤,因此如果大型Dataframe
中只有少数True
s 值应该更慢:
df1 = df1.assign(Dates = np.where(df1['mask'], df1['Dates'].pipe(dates), df1['Dates']))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.