Dataframe：按天分组并在第一次出现后在特定列中设置值

Question

我有这个 dataframe：

                        column1     column2 column3     Filter
2000/01/02 13:35        2.55651     0.1275  0.198       1.0
2000/01/03 14:35        3.30585     0.9425  0.009       1.0
2000/01/04 16:30        3.40865     1.7897  0.515       1.0
2000/01/05 14:15        2.96273     0.6266  0.506       1.0
2000/01/07 14:40        2.75470     0.1724  0.405       1.0
2000/01/07 15:40        2.50288     0.4133  0.075       **1.0**
2000/01/09 14:35        2.20984     0.7232  0.818       1.0
2000/01/09 16:00        2.21001     0.2815  0.160       **1.0**

我想在每天第一次出现后将“过滤器”列的重复项设置为零。 在这个例子中，我把 ** 放在我想将它们设置为零的值中。 其他都还好。 我想使用 loc function。 提前致谢。

Answer 1

您可以通过创建一个仅包含日期（不是整个日期时间）的临时列来获取此信息，然后使用df.duplicated()在另一个临时列中标记随后出现的日期，然后使用该标志来驱动更改过滤栏

import pandas as pd
df = pd.read_csv('testdf.csv') #put your real filepath here
df.rename(columns={'Unnamed: 0':'datetime'},inplace=True) # I renamed that first column, just for fun
df['datetime'] = pd.to_datetime(df['datetime'],format='%Y/%m/%d %H:%M') #changes the format from a string to a datetime
df['date'] = pd.to_datetime(df['datetime']).dt.date # makes a new column with jsut the date
df['Duplicate?'] = df.duplicated(subset='date',keep='first') #makes a new column called duplicated. first instance is False and all next instances are True
df.loc[df['Duplicate?'] == True,'Filter'] = 0 #changes Filter to 0 when Duplicated is True
df = df.drop(columns=['date','Duplicate?']) #drops our temporary date and Duplicated columns
print(df)

Dataframe：按天分组并在第一次出现后在特定列中设置值

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-05-22 13:35:27

Dataframe：按天分组并在第一次出现后在特定列中设置值

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-05-22 13:35:27

解决方案1
1 已采纳 2021-05-22 13:35:27