[英]How to identify data rows for the last 10 days in CSV file with pandas?
I'm new to Python and currently seeking help with the following:我是 Python 新手,目前正在寻求以下方面的帮助:
How can I identify data rows for the last 10 days in CVS file with Pandas?如何使用 Pandas 在 CVS 文件中识别过去 10 天的数据行? My first column (report_date) in CSV file has data values (yyyy-mm-dd) I have hundreds of records for each day, but I need to get only last 10 days from this file, based on the date in report_date column and ideally save output to a new CSV file.
我在 CSV 文件中的第一列 (report_date) 有数据值 (yyyy-mm-dd) 我每天有数百条记录,但我只需要根据 report_date 列中的日期和理想情况从该文件中获取最后 10 天将输出保存到新的 CSV 文件。
My code so far:到目前为止我的代码:
import pandas as pd
data = pd.read_csv("path/to/my/file/myfile.csv")
df = pd.DataFrame(report_date)
days=10
cutoff_date = df["report_date"].dt.date.iloc[-1] - pd.Timedelta(days=days)
Would someone be able to help?有人可以帮忙吗? Thanks in advance!
提前致谢!
Create DatetimeIndex
first with index_col
and parse_dates
parameters in read_csv
:创建
DatetimeIndex
与第一index_col
和parse_dates
在参数read_csv
:
df = pd.read_csv("path/to/my/file/myfile.csv",
index_col=['report_date'],
parse_dates=['report_date'])
And then is possible use DataFrame.last
:然后可以使用
DataFrame.last
:
df1 = df.last('10d')
And last save to file by DataFrame.to_csv
:最后通过
DataFrame.to_csv
保存到文件:
df1.to_csv('new.csv')
Your solution should be changed with convert column to datetimes in read_csv
:您的解决方案应该更改为在
read_csv
列转换为日期read_csv
:
df = pd.read_csv("path/to/my/file/myfile.csv", parse_dates=['report_date'])
days=10
cutoff_date = df["report_date"].dt.date.iloc[-1] - pd.Timedelta(days=days)
Then compare dates by Series.dt.date
in boolean indexing
:然后在
boolean indexing
按Series.dt.date
比较日期:
df1 = df[df["report_date"].dt.date > cutoff_date]
Last save to file with removed default index by DataFrame.to_csv
:最后通过
DataFrame.to_csv
保存到删除默认索引的DataFrame.to_csv
:
df1.to_csv('new.csv', index=False)
EDIT: I believe you need:编辑:我相信你需要:
df = pd.DataFrame({'data': range(30)}, index= pd.date_range('2020-01-25', periods=30))
print (df)
data
2020-01-25 0
2020-01-26 1
2020-01-27 2
2020-01-28 3
2020-01-29 4
2020-01-30 5
2020-01-31 6
2020-02-01 7
2020-02-02 8
2020-02-03 9
2020-02-04 10
2020-02-05 11
2020-02-06 12
2020-02-07 13
2020-02-08 14
2020-02-09 15
2020-02-10 16
2020-02-11 17
2020-02-12 18
2020-02-13 19
2020-02-14 20
2020-02-15 21
2020-02-16 22
2020-02-17 23
2020-02-18 24
2020-02-19 25
2020-02-20 26
2020-02-21 27
2020-02-22 28
2020-02-23 29
today = pd.Timestamp('today').floor('d')
df1 = df[df.index > today].first('10d')
print (df1)
data
2020-02-11 17
2020-02-12 18
2020-02-13 19
2020-02-14 20
2020-02-15 21
2020-02-16 22
2020-02-17 23
2020-02-18 24
2020-02-19 25
2020-02-20 26
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.