How can I set last rows of a dataframe based on condition in Python?

Question

I have 1 dataframes, df1, with 2 different columns. The first column 'col1' is a datetime column, and the second one is a int column with only 2 possible values (0 or 1). Here is an example of the dataframe:


+----------------------+----------+
|          col1        |  col2    |
+----------------------+----------+
|  2020-01-01 10:00:00 |   0      |
+----------------------+----------+
|  2020-01-01 11:00:00 |   1      |
+----------------------+----------+
|  2020-01-01 12:00:00 |   1      |
+----------------------+----------+
|  2020-01-02 11:00:00 |   0      |
+----------------------+----------+
|  2020-01-02 12:00:00 |   1      |
+----------------------+----------+
|        ...           |   ...    |
+----------------------+----------+

As you can see, the datetimes are sorted in an ascending order. What I would like is: for each diferent date (in this example are 2 diferent dates, 2020-01-01 and 2020-01-02 with diferent times) I would like to mantain the first 1 value and put as 0 the previous and the next ones in that date. So, the resulting dataframe would be:


+----------------------+----------+
|          col1        |  col2    |
+----------------------+----------+
|  2020-01-01 10:00:00 |   0      |
+----------------------+----------+
|  2020-01-01 11:00:00 |   1      |
+----------------------+----------+
|  2020-01-01 12:00:00 |   0      |
+----------------------+----------+
|  2020-01-02 11:00:00 |   0      |
+----------------------+----------+
|  2020-01-02 12:00:00 |   1      |
+----------------------+----------+
|        ...           |   ...    |
+----------------------+----------+

How can I do it in Python?

Answer 1

Use:

df['col1'] = pd.to_datetime(df.col1)
mask = df.groupby(df.col1.dt.date)['col2'].cumsum().eq(1)
df.col2.where(mask, 0, inplace = True)

Output:

>>> df
                  col1  col2
0  2020-01-01 10:00:00     0
1  2020-01-01 12:00:00     1
2  2020-01-01 12:00:00     0
3  2020-01-02 11:00:00     0
4  2020-01-02 12:00:00     1

How can I set last rows of a dataframe based on condition in Python?

Question

1 answers

solution1
1 ACCPTED 2021-04-19 11:47:09

How can I set last rows of a dataframe based on condition in Python?

Question

1 answers

solution1 1 ACCPTED 2021-04-19 11:47:09

solution1
1 ACCPTED 2021-04-19 11:47:09