[英]Compare the previous N rows to the current row in a pandas column
I am trying to populate a new column in my pandas dataframe by considering the values of the previous n rows. 我试图通过考虑前n行的值来填充我的pandas数据框中的新列。 If the current value is not equal to any of the past n values in that column, it should populate "N", else "Y". 如果当前值不等于该列中任何过去的n值,则应填充“N”,否则填充“Y”。
Please let me know what would be a good way to achieve this. 请让我知道实现这一目标的好方法。
Here's my input data : 这是我的输入数据:
testdata = {'col1' :['car','car','car','bus','bus','bus','car']}
df = pd.DataFrame.from_dict(testdata)
Input DF: 输入DF:
col1
0 car
1 car
2 car
3 bus
4 bus
5 car
6 car
Output DF (with n=2): 输出DF(n = 2):
col1 Result
0 car
1 car
2 car Y
3 bus N
4 bus Y
5 bus Y
6 car N
You can do this with a Rolling.apply
call. 您可以使用Rolling.apply
调用执行此Rolling.apply
。
n = 2
res = (df['col1'].astype('category')
.cat.codes
.rolling(n+1)
.apply(lambda x: x[-1] in x[:-1], raw=True))
df['Result'] = np.where(res == 1, 'Y', 'N')
df
col1 Result
0 car N
1 car N
2 car Y
3 bus N
4 bus Y
5 bus Y
6 car N
Rolling only works with numeric data, so the initial step is to factorise it. 滚动仅适用于数字数据,因此最初的步骤是将其分解。 This can be done in many ways, I've used astype('category')
and then extracted the codes. 这可以通过多种方式完成,我使用了astype('category')
然后提取代码。
Another option is using pd.Categorical
for the conversion, 另一种选择是使用pd.Categorical
进行转换,
res = (df.assign(col1=pd.Categorical(df['col1']).codes)['col1']
.rolling(n+1)
.apply(lambda x: x[-1] in x[:-1], raw=True))
df['Result'] = res.map({1: 'Y', 0: 'N'})
df
col1 Result
0 car NaN
1 car NaN
2 car Y
3 bus N
4 bus Y
5 bus Y
6 car N
Here is my way 这是我的方式
n=2
l=[False]*n+[df.iloc[x,0] in df.iloc[x-n:x,0].tolist() for x in np.arange(n,len(df))]
df['New']=l
df
col1 New
0 car False
1 car False
2 car True
3 bus False
4 bus True
5 bus True
6 car False
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.