[英]Python: take unique dates in dataframe
I have a data frame that looks like this:我有一个看起来像这样的数据框:
price
Date
2022-01-01 19:20:00 100
2022-01-01 19:27:00 100
2022-01-02 19:31:00 102
I want the dataframe to only have unique dates:我希望 dataframe 只有唯一日期:
price
Date
2022-01-01 19:20:00 100
2022-01-02 19:31:00 102
How can I achieve that?我怎样才能做到这一点?
You can sort the dataframe
with:您可以使用以下命令对
dataframe
进行排序:
df = df.sort_values('Date')
And than leave only the rows with a new date with:而不是只留下带有新日期的行:
df = df[df['Date'].dt.date != df['Date'].shift().dt.date]
You can extract the date from the datetime column using df.Date.dt.date
, put that into a new column using assign
, and after that use drop_duplicates
based on only that column.您可以使用
df.Date.dt.date
从 datetime 列中提取日期,使用assign
将其放入新列,然后仅基于该列使用drop_duplicates
。 Last, you might want to drop the newly create column that has only the date information.最后,您可能希望删除仅包含日期信息的新创建列。 In code that reads
在读取的代码中
df = (
df.assign(new_date=lambda df:df.Date.dt.date)
.drop_duplicates(subset=["new_date"])
.drop(columns=["new_date"])
)
You can simply use duplicated
:您可以简单地使用
duplicated
:
# pre-requisite
df['Date'] = pd.to_datetime(df['Date'])
df[~df['Date'].dt.date.duplicated()]
Or if working with the index:或者如果使用索引:
df[~df.index.to_series().dt.date.duplicated().values]
Output: Output:
Date price
0 2022-01-01 19:20:00 100
2 2022-01-02 19:31:00 102
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.