根据另一列中的条件屏蔽列

Question

class class	subclass子类	date日期	value价值
1 1个	A一种	02-10-22 02-10-22	.5 .5
1 1个	A一种	02-21-22 02-21-22	.6 .6
1 1个	A一种	02-28-22 02-28-22	.8 .8
1 1个	B乙	02-09-22 02-09-22	.3 .3
1 1个	B乙	02-14-22 02-14-22	.4 .4
1 1个	B乙	02-28-22 02-28-22	.5 .5
2 2个	C C	02-15-22 02-15-22	.9 .9
2 2个	C C	02-28-22 02-28-22	.8 .8

I have a dataframe like above.我有一个像上面那样的 dataframe。 Several (class, subclass) pairs have values ordered by dates.几个（类，子类）对具有按日期排序的值。 The bottom date for each (class, subclass) is guaranteed to be the maximum date, for example 02-28-22.每个（类、子类）的底部日期保证是最大日期，例如 02-28-22。

I would like to transform to the dataset below.我想转换为下面的数据集。 For the date right before the maximum date, if it is not exactly 7 days before the maximum date - we change the corresponding value to NaN.对于刚好在最大日期之前的日期，如果它不正好在最大日期之前 7 天 - 我们将相应的值更改为 NaN。 Otherwise we leave it alone, as well the other dates.否则我们不管它，以及其他日期。 FE, the row with date 02-21-22 is left alone; FE，保留日期为 02-21-22 的行； while the row with 02-14-22 is now NaN.而带有 02-14-22 的行现在是 NaN。

Dates are stored as strings: '02-15-22'.日期存储为字符串：'02-15-22'。

class class	subclass子类	date日期	value价值
1 1个	A一种	02-10-22 02-10-22	.5 .5
1 1个	A一种	02-21-22 02-21-22	.6 .6
1 1个	A一种	02-28-22 02-28-22	.8 .8
1 1个	B乙	02-09-22 02-09-22	.3 .3
1 1个	B乙	02-14-22 02-14-22	NaN钠
1 1个	B乙	02-28-22 02-28-22	.5 .5
2 2个	C C	02-15-22 02-15-22	NaN钠
2 2个	C C	02-28-22 02-28-22	.8 .8

Answer 1

Find the max date and the second max date using groupby .使用groupby查找最大日期和第二个最大日期。 Then use where to mask the relevant values:然后使用where来屏蔽相关值：

maxdate = df.groupby(["class", "subclass"])["date"].transform('max')
nextmaxdate = df.groupby(["class","subclass"])["date"].transform(lambda x: x.nlargest(2).min())

df["value"] = df["value"].where(df["date"].ne(nextmaxdate) | maxdate.sub(nextmaxdate).dt.days.eq(7))

>>> df
   class subclass       date  value
0      1        A 2022-02-10    0.5
1      1        A 2022-02-21    0.6
2      1        A 2022-02-28    0.8
3      1        B 2022-02-09    0.3
4      1        B 2022-02-14    NaN
5      1        B 2022-02-28    0.5
6      2        C 2022-02-15    NaN
7      2        C 2022-02-28    0.8

Answer 2

Calculate reverse cumcount to identify the row preceding the last row, then group and shift the date column and subtract a offset of 7 days , then mask the values in value column preceding to last row where the required condition is not met计算反向cumcount以识别最后一行之前的行，然后对date列进行group和shift并减去7 days的偏移量，然后mask不满足要求条件的最后一行之前的value列中的值

c = df[::-1].groupby(['class', 'subclass']).cumcount()
d = df.groupby(['class', 'subclass'])['date'].shift(-1) - pd.DateOffset(days=7)
df['value'] = df['value'].mask(df['date'].ne(d) & c.eq(1))

   class subclass       date  value
0      1        A 2022-02-10    0.5
1      1        A 2022-02-21    0.6
2      1        A 2022-02-28    0.8
3      1        B 2022-02-09    0.3
4      1        B 2022-02-14    NaN
5      1        B 2022-02-28    0.5
6      2        C 2022-02-15    NaN
7      2        C 2022-02-28    0.8

根据另一列中的条件屏蔽列

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-05-09 17:03:51

解决方案2
0 2022-05-09 17:11:14

根据另一列中的条件屏蔽列

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-05-09 17:03:51

解决方案2 0 2022-05-09 17:11:14

解决方案1
1 已采纳 2022-05-09 17:03:51

解决方案2
0 2022-05-09 17:11:14