How to loop through pandas dataframe and modify value under condition?

Question

I have this pandas dataframe:

df = pd.DataFrame(
    {
    "col1": [1,1,2,3,3,3,4,5,5,5,5]
    }
)
df

I want to add another column that says "last" if the value in col1 doesnt equal the value of col1 in the next row. This is how it should look like:

So far, I can create a column that contains True when if the value in col1 doesnt equal the value of col1 in the next row; and False otherwise:

df["last_row"] = df["col1"].shift(-1)
df['last'] = df["col1"] != df["last_row"]
df = df.drop(["last_row"], axis=1)
df

Now something like

df["last_row"] = df["col1"].shift(-1)
df['last'] = "last" if df["col1"] != df["last_row"]
df = df.drop(["last_row"], axis=1)
df

would be nice, but this is apparently the wrong syntax. How can I manage to do this?

Ultimatly, I also want to add numbers that indicate how many time a value appear before this while the last value is always marked with "last". It should look like this:

I'm not sure if this is another step in my development or if this requires a new approach. I read that if I want to loop through an array while modifying values, I should use apply(). However, I don't know how to include conditions in this. Can you help me?

Thanks a lot!

Answer 1

Here's one way. You can obtain a cumulative count based on whether or not the next value in col1 is the same as that of the current row, defining a custom grouper, and taking the DataFrameGroupBy.cumsum . Then add last using a similar criteria using df.shift :

g = df.col1.ne(df.col1.shift(1)).cumsum()
df['update'] = df.groupby(g).cumcount()
ix = df[df.col1.ne(df.col1.shift(-1))].index
# Int64Index([1, 2, 5, 6, 10], dtype='int64')
df.loc[ix,'update'] = 'last'

 col1 update
0      1      0
1      1   last
2      2   last
3      3      0
4      3      1
5      3   last
6      4   last
7      5      0
8      5      1
9      5      2
10     5   last

Answer 2

considering that the index is incremental, (1) cuncount each group, then take (2) max index inside each group and set the string

group = df.groupby('col1')

df['last'] = group.cumcount()
df.loc[group['last'].idxmax(), 'last'] = 'last'
#or df.loc[group.apply(lambda x: x.index.max()), 'last'] = 'last'


    col1    last
0   1   0
1   1   last
2   2   last
3   3   0
4   3   1
5   3   last
6   4   last
7   5   0
8   5   1
9   5   2
10  5   last

Answer 3

Use .shift to find where things change. Then you can use .where to mask appropriately then .fillna

s = df.col1 != df.col1.shift(-1)
df['Update'] = df.groupby(s.cumsum().where(~s)).cumcount().where(~s).fillna('last')

Output:

    col1 Update
0      1      0
1      1   last
2      2   last
3      3      0
4      3      1
5      3   last
6      4   last
7      5      0
8      5      1
9      5      2
10     5   last

As an aside, update is a method of DataFrames, so you should avoid naming a column 'update'

Answer 4

另一种可能的解决

df['update'] = np.where(df['col1'].ne(df['col1'].shift(-1)), 'last', 0)

How to loop through pandas dataframe and modify value under condition?

Question

4 answers

solution1
3 2019-04-26 15:49:27

solution2
2 ACCPTED 2019-04-26 15:57:23

solution3
2 2019-04-26 16:06:28

Output:

solution4
1 2019-04-26 15:57:57

How to loop through pandas dataframe and modify value under condition?

Question

4 answers

solution1 3 2019-04-26 15:49:27

solution2 2 ACCPTED 2019-04-26 15:57:23

solution3 2 2019-04-26 16:06:28

Output:

solution4 1 2019-04-26 15:57:57

solution1
3 2019-04-26 15:49:27

solution2
2 ACCPTED 2019-04-26 15:57:23

solution3
2 2019-04-26 16:06:28

solution4
1 2019-04-26 15:57:57