[英]Filter of pandas Dataframe based on values of 2 consecutive rows
I have a pandas Dataframe and I want to extract consecutive lines where:我有一个 pandas Dataframe 并且我想提取连续的行,其中:
To give a concrete example, let's say I have:举一个具体的例子,假设我有:
from datetime import datetime
import pandas as pd
df = pd.DataFrame([
[datetime(2021, 1, 1), "Pizza", 50, "Some Place"],
[datetime(2021, 1, 2), "Noddles", 36, "Some Place"],
[datetime(2021, 1, 3), "Rice", 10, "Asian Delice"],
[datetime(2021, 1, 4), "Pizza", 36, "Some Place"],
[datetime(2021, 1, 5), "Steak", 75, "Steak House"],
[datetime(2021, 1, 6), "Pizza", 52, "Another Place"],
[datetime(2021, 1, 6), "Rice", 10, "Asian Delice"],
[datetime(2021, 1, 7), "Noddles", 42, "Another Place"],
[datetime(2021, 1, 8), "Dumplings", 12, "Asian Delice"],
[datetime(2021, 1, 9), "Noddles", 39, "Some Place"],
[datetime(2021, 1, 10), "Pizza", 53, "Some Place"],
[datetime(2021, 1, 13), "Noddles", 0, "Some Place"],
[datetime(2021, 1, 14), "Pizza", 0, "Another Place"],
], columns=["Date", "Food", "Cost", "Restaurant"])
I want to extract rows where in 2 consecutive days, we have Pizza and Noddles in the same restaurant, so the result would be:我想提取连续 2 天在同一家餐厅有 Pizza 和 Noddles 的行,所以结果是:
Date Food Cost Restaurant
0 2021-01-01 Pizza 50 Some Place
1 2021-01-02 Noddles 36 Some Place
5 2021-01-06 Pizza 52 Another Place
7 2021-01-07 Noddles 42 Another Place
9 2021-01-09 Noddles 39 Some Place
10 2021-01-10 Pizza 53 Some Place
How could I achieve that with pandas?我怎样才能用 pandas 实现这一目标?
Let us do让我们做
df = df.loc[df.Food.isin(['Pizza','Noddles'])]
#filter the only food you need
s = df.Date.diff().dt.days.le(2)
# get the diff between each food , find the day diff less than 2
out = df.groupby([df['Restaurant'],s]).filter(lambda x : pd.Series(['Pizza','Noddles']).isin(x['Food']).all())
Out[113]:
Date Food Cost Restaurant
1 2021-01-02 Noddles 36 Some Place
3 2021-01-04 Pizza 36 Some Place
5 2021-01-06 Pizza 52 Another Place
7 2021-01-07 Noddles 42 Another Place
9 2021-01-09 Noddles 39 Some Place
10 2021-01-10 Pizza 53 Some Place
Inspired by @BENY (Thanks Beny) I came up with this solution that does not seem ideal but at least it works.受到@BENY 的启发(感谢 Beny),我想出了这个看起来并不理想但至少可行的解决方案。
Any suggestion to improve or alternate solution which is more "pandas-ic" is welcome;)欢迎任何改进或替代更“熊猫”的解决方案的建议;)
df = df[df.Food.isin(['Pizza','Noddles'])]
restaurants = list(set(df.Restaurant))
df["RestoID"] = df.apply(lambda row:restaurants.index(row.Restaurant), axis=1)
mask = df.Date.diff().dt.days.le(1) & df.RestoID.diff().eq(0)
mask |= df.Date.diff(-1).dt.days.ge(-1) & df.RestoID.diff(-1).eq(0)
df[mask].drop("RestoID", axis=1)
And the result is结果是
Date Food Cost Restaurant
0 2021-01-01 Pizza 50 Some Place
1 2021-01-02 Noddles 36 Some Place
5 2021-01-06 Pizza 52 Another Place
7 2021-01-07 Noddles 42 Another Place
9 2021-01-09 Noddles 39 Some Place
10 2021-01-10 Pizza 53 Some Place
A better and more elegant solution is to shift rows to perform computations, something like that:更好更优雅的解决方案是移动行以执行计算,如下所示:
df = df.loc[df.Food.isin(['Pizza','Noddles'])]
mask = False
for i in [-1, 1]:
mask |= df.Date.diff(i).dt.days.le(i) & df.Food.ne(df.Food.shift(i)) & df.Restaurant.eq(df.Restaurant.shift(i))
df[mask]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.