如何根据特定条件替换列中的某些值？

Question

I have the following df我有以下df

col_1   col_2
1   1
1   2
1   3
1   6
1   8
1   11
1   12
1   19
1   24
1   1
1   1
1   2
1   2
1   3 
1   3
2   1
2   2
2   4
2   6
2   7
2   11
2   13
2   16
2   19
2   1
2   2
2   3

I would like to do kind of groupby on col_1 and replace the values 1, 2, 3 that occur after 19 in the col_2 and change them with 25, 26, 27.我想对col_1进行某种分组，并替换col_2 19之后出现的值 1、2、3，并将它们更改为 25、26、27。

Expected Output:预期输出：

col_1   col_2
1   1
1   2
1   3
1   6
1   8
1   11
1   12
1   19
1   24
1   25
1   25
1   26
1   26
1   27
1   27
2   1
2   2
2   4
2   6
2   7
2   11
2   13
2   16
2   19
2   25
2   26
2   27

I would like to know how can this be done using pandas.我想知道如何使用熊猫来做到这一点。

Thanks谢谢

Edit 1:编辑1：

My real df我真正的df

ContextID   BacksGas_Flow_sccm  StepID
7289973 1.953125    1
7289973 2.05078125  2
7289973 2.05078125  2
7289973 2.05078125  2
7289973 1.953125    2
7289973 1.7578125   2
7289973 1.7578125   2
7289973 1.85546875  2
7289973 1.7578125   2
7289973 9.08203125  5
7289973 46.19140625 5
7289973 46.19140625 5
7289973 46.19140625 5
7289973 46.19140625 5
7289973 46.6796875  5
7289973 46.6796875  5
7289973 46.6796875  5
7289973 46.6796875  5
7289973 46.6796875  5
7289973 46.6796875  5
7289973 46.6796875  5
7289973 46.6796875  5
7289973 46.6796875  5
7289973 46.6796875  5
7289973 46.6796875  7
7289973 46.6796875  7
7289973 46.6796875  7
7289973 46.6796875  12
7289973 46.6796875  12
7289973 46.6796875  12
7289973 46.6796875  12
7289973 46.6796875  12
7289973 46.6796875  12
7289973 46.6796875  12
7289973 46.6796875  15
7289973 46.6796875  15
7289973 46.6796875  16
7289973 46.6796875  16
7289973 46.6796875  17
7289973 25.09765625 19
7289973 45.99609375 19
7289973 59.08203125 19
7289973 61.81640625 19
7289973 62.59765625 19
7289973 63.671875   19
7289973 65.625  19
7289973 66.69921875 19
7289973 67.3828125  19
7289973 67.3828125  19
7289973 67.67578125 19
7289973 68.26171875 19
7289973 69.04296875 19
7289973 69.82421875 19
7289973 69.82421875 19
7289973 70.8984375  19
7289973 70.8984375  19
7289973 70.8984375  19
7289973 70.8984375  19
7289973 71.6796875  19
7289973 71.6796875  19
7289973 72.55859375 19
7289973 72.55859375 19
7289973 72.55859375 19
7289973 72.55859375 19
7289973 72.55859375 19
7289973 72.55859375 19
7289973 73.33984375 19
7289973 73.33984375 19
7289973 73.33984375 19
7289973 74.12109375 19
7289973 74.12109375 19
7289973 74.12109375 19
7289973 73.2421875  19
7289973 73.2421875  19
7289973 74.0234375  19
7289973 74.0234375  19
7289973 74.0234375  19
7289973 74.0234375  19
7289973 74.0234375  19
7289973 74.0234375  19
7289973 74.0234375  19
7289973 74.0234375  19
7289973 74.0234375  19
7289973 74.90234375 19
7289973 74.90234375 19
7289973 74.12109375 19
7289973 74.12109375 19
7289973 74.12109375 19
7289973 74.12109375 19
7289973 74.12109375 19
7289973 75          19
7289973 75          19
7289973 75          19
7289973 74.21875    19
7289973 74.21875    19
7289973 74.21875    19
7289973 75          19
7289973 75          19
7289973 75          19
7289973 75          19
7289973 74.12109375 19
7289973 74.12109375 19
7289973 74.12109375 19
7289973 74.90234375 19
7289973 6.4453125   24
7289973 3.515625    24
7289973 2.5390625   24
7289973 2.05078125  24
7289973 2.05078125  24
7289973 2.05078125  24
7289973 1.953125    24
7289973 1.953125    24
7289973 1.953125    24
7289973 1.953125    24
7289973 2.05078125  24
7289973 1.85546875  24
7289973 1.85546875  24
7289973 1.85546875  24
7289973 1.85546875  24
7289973 1.85546875  24
7289973 2.05078125  24
7289973 1.953125    24
7289973 1.953125    24
7289973 1.7578125   24
7289973 1.66015625  24
7289973 1.7578125   24
7289973 1.7578125   24
7289973 1.7578125   24
7289973 1.85546875  24
7289973 1.85546875  24
7289973 1.953125    24
7289973 1.953125    24
7289973 1.953125    24
7289973 1.953125    24
7289973 1.953125    24
7289973 1.7578125   24
7289973 1.85546875  24
7289973 1.85546875  24
7289973 1.85546875  24
7289973 1.7578125   24
7289973 1.85546875  24
7289973 1.85546875  24
7289973 1.7578125   24
7289973 1.7578125   1
7289973 1.85546875  1
7289973 1.85546875  1
7289973 1.85546875  2
7289973 1.7578125   2
7289973 1.953125    2
7289973 1.953125    2
7289973 1.85546875  2
7289973 1.85546875  3
7289973 1.85546875  3
7289973 1.85546875  3
7289973 1.953125    3
7289973 1.85546875  3
7289973 1.953125    3
7289973 1.85546875  3
7289973 1.7578125   3
7289973 1.85546875  3
7289973 1.85546875  3
7289973 1.7578125   3
7289973 1.85546875  3

Answer 1

One way would be to create a dictionary to replace the values in col_2 .一种方法是创建一个字典来replace col_2的值。 In order to replace only those that appear after a 19 , GroupBy , check equality and take the cumsum to perform boolean indexation on the dataframe:为了仅替换出现在19之后的那些GroupBy ，请检查相等性并采用cumsum对数据帧执行布尔索引：

map_ = {1:25, 2:26, 3:27}
cs = df.col_2.eq(19).groupby(df.col_1).cumsum()
update = df.loc[cs].col_2.replace(map_)
df.loc[update.index, 'col_2'] = update

 col_1  col_2
0       1      1
1       1      2
2       1      3
3       1      6
4       1      8
5       1     11
6       1     12
7       1     19
8       1     25
9       1     26
10      1     27
11      2      1
12      2      2
13      2      4
14      2      6
15      2      7
16      2     11
17      2     13
18      2     16
19      2     19
20      2     25
21      2     26
22      2     27

Answer 2

My try:我的尝试：

def fill19(x):
    # x.shift()==19 marks all 1's after 19's
    # rolling(3) marks three numbers after 19's
    filters = (x.shift()==19).rolling(3).sum().fillna(0).astype(bool)
    x[filters] += 24
    return x

df.col2 = df.groupby('col_1').col_2.apply(fill19)

0      1
1      2
2      3
3      6
4      8
5     11
6     12
7     19
8     25
9     26
10    27
11     1
12     2
13     4
14     6
15     7
16    11
17    13
18    16
19    19
20    25
21    26
22    27
Name: col_2, dtype: int64

Answer 3

Try below for loop, it does what is required in your case:试试下面的 for 循环，它可以满足您的需求：

for i in df['col_1'].unique():
     ix = np.argwhere((df['col_1'] == i) & (df['col_2'] == 19 ))
     df.loc[ix[0][0]+1, 'col_2'] = 25
     df.loc[ix[0][0]+2, 'col_2'] = 26
     df.loc[ix[0][0]+3, 'col_2'] = 27

Answer 4

Old school looping老学校循环

map_ = {1: 25, 2: 26, 3: 27}

d = {}  # Tracks if 19 has been seen yet

for i, c1, c2 in df.itertuples():

    if d.setdefault(c1, False):
        df.at[i, 'col_2'] = map_.get(c2, c2)

    d[c1] |= c2 == 19

Using `np.logigcal_or`使用`np.logigcal_or`

m = df.col_2.eq(19)
m = m.groupby(df.col_1).transform(np.logical_or.accumulate) ^ m
df.assign(col_2=df.col_2 + m * 24)

    col_1  col_2
0       1      1
1       1      2
2       1      3
3       1      6
4       1      8
5       1     11
6       1     12
7       1     19
8       1     25
9       1     26
10      1     27
11      2      1
12      2      2
13      2      4
14      2      6
15      2      7
16      2     11
17      2     13
18      2     16
19      2     19
20      2     25
21      2     26
22      2     27

Answer 5

Based on your real DataFrame, you can do the following:根据您的真实 DataFrame，您可以执行以下操作：

df['StepID'] = df.groupby('ContextID')['StepID'].apply(
    lambda x: x + (x < x.shift(1)).cumsum()*24)

print(df.tail(25))

Output:输出：

     ContextID  BacksGas_Flow_sccm  StepID
138    7289973            1.855469      24
139    7289973            1.757812      24
140    7289973            1.855469      24
141    7289973            1.855469      24
142    7289973            1.757812      24
143    7289973            1.757812      25
144    7289973            1.855469      25
145    7289973            1.855469      25
146    7289973            1.855469      26
147    7289973            1.757812      26
148    7289973            1.953125      26
149    7289973            1.953125      26
150    7289973            1.855469      26
151    7289973            1.855469      27
152    7289973            1.855469      27
153    7289973            1.855469      27
154    7289973            1.953125      27
155    7289973            1.855469      27
156    7289973            1.953125      27
157    7289973            1.855469      27
158    7289973            1.757812      27
159    7289973            1.855469      27
160    7289973            1.855469      27
161    7289973            1.757812      27
162    7289973            1.855469      27

PS In your sample data there is only one ContextID , but I'm assuming there might be others as well in the full dataset, so I've added groupby PS 在您的示例数据中只有一个ContextID ，但我假设完整数据集中可能还有其他数据，所以我添加了groupby

Update : the following is if you only need to increment values after 24 for each ContextID once by 24 (I'm saving new values to StepID_new column to show before-and-after the transformation):更新：下面是，如果你只需要增量值后， 24对每个ContextID曾经24（我保存新值StepID_new栏，显示之前和之后的转型）：

x = [1,2,3,19,19,24,1,1,2,2,3,3,2,3,1]
df = pd.DataFrame({'ContextID': np.repeat([1,2], len(x)),
                   'StepID': x * 2})

df['StepID_new'] = df['StepID'] + df.groupby('ContextID')['StepID'].transform(
    lambda x: ((x==24).cumsum() > 0).shift(1, fill_value=0) * 25)

print(df)

Output:输出：

    ContextID  StepID  StepID_new
0           1       1           1
1           1       2           2
2           1       3           3
3           1      19          19
4           1      19          19
5           1      24          24
6           1       1          26
7           1       1          26
8           1       2          27
9           1       2          27
10          1       3          28
11          1       3          28
12          1       2          27
13          1       3          28
14          1       1          26
15          2       1           1
16          2       2           2
17          2       3           3
18          2      19          19
19          2      19          19
20          2      24          24
21          2       1          26
22          2       1          26
23          2       2          27
24          2       2          27
25          2       3          28
26          2       3          28
27          2       2          27
28          2       3          28
29          2       1          26

如何根据特定条件替换列中的某些值？

问题描述

5 个解决方案

解决方案1
2 2019-05-02 13:01:36

解决方案2
0 2019-05-02 13:08:22

解决方案3
0 2019-05-02 13:08:52

解决方案4
0 2019-05-02 13:47:58

Old school looping老学校循环

Using `np.logigcal_or`使用`np.logigcal_or`

解决方案5
0 已采纳 2019-05-02 14:27:51

如何根据特定条件替换列中的某些值？

问题描述

5 个解决方案

解决方案1 2 2019-05-02 13:01:36

解决方案2 0 2019-05-02 13:08:22

解决方案3 0 2019-05-02 13:08:52

解决方案4 0 2019-05-02 13:47:58

Old school looping老学校循环

Using np.logigcal_or使用np.logigcal_or

解决方案5 0 已采纳 2019-05-02 14:27:51

解决方案1
2 2019-05-02 13:01:36

解决方案2
0 2019-05-02 13:08:22

解决方案3
0 2019-05-02 13:08:52

解决方案4
0 2019-05-02 13:47:58

Using `np.logigcal_or`使用`np.logigcal_or`

解决方案5
0 已采纳 2019-05-02 14:27:51