繁体   English   中英

如何根据csv文件的python中的日期过滤掉数据

[英]How to filter out data based on date in python of a csv file

我有一个数据集如下&我想过滤从 2021-07-30 到 2021-08-03 的数据下面是数据集

输入.csv

created_at,text,label
2021-07-24,Newzeland Wins the worldcup,Sport
2021-07-25,ABC Wins the worldcup,Sport
2021-07-26,Hello the worldcup,Sport
2021-07-27,Cricket worldcup,Sport
2021-07-28,Rugby worldcup,Sport
2021-07-29,LLL Wins,Sport
2021-07-30,MMM Wins the worldcup,Sport
2021-07-31,RRR Wins the worldcup,Sport
2021-08-01,OOO Wins the worldcup,Sport
2021-08-02,JJJ Wins the worldcup,Sport
2021-08-03,YYY Wins the worldcup,Sport
2021-08-04,KKK Wins the worldcup,Sport
2021-08-05,YYY Wins the worldcup,Sport
2021-08-06,GGG Wins the worldcup,Sport
2021-08-07,FFF Wins the worldcup,Sport
2021-08-08,SSS Wins the worldcup,Sport
2021-08-09,XYZ Wins the worldcup,Sport
2021-08-10,PQR Wins the worldcup,Sport

output.csv

created_at,text,label
2021-07-30,MMM Wins the worldcup,Sport
2021-07-31,RRR Wins the worldcup,Sport
2021-08-01,OOO Wins the worldcup,Sport
2021-08-02,JJJ Wins the worldcup,Sport
2021-08-03,YYY Wins the worldcup,Sport
import pandas as pd
def save():
    tweets = pd.read_csv(r'input.csv.csv')
    df = pd.DataFrame(tweets, columns=['created_at', 'text','label'])

if __name__ == '__main__':
    save()
df[(df.created_at >= '2021-07-30') & (df.created_at <= '2021-08-03')]

Output:

    created_at                   text  label
6   2021-07-30  MMM Wins the worldcup  Sport
7   2021-07-31  RRR Wins the worldcup  Sport
8   2021-08-01  OOO Wins the worldcup  Sport
9   2021-08-02  JJJ Wins the worldcup  Sport
10  2021-08-03  YYY Wins the worldcup  Sport

尝试:

df = pd.read_csv('input.csv', parse_dates=['created_at'])
out = df[df['created_at'].between('2021-07-30', '2021-08-03')]
out.to_csv('output.csv', index=False)

output.csv的内容:

created_at,text,label
2021-07-30,MMM Wins the worldcup,Sport
2021-07-31,RRR Wins the worldcup,Sport
2021-08-01,OOO Wins the worldcup,Sport
2021-08-02,JJJ Wins the worldcup,Sport
2021-08-03,YYY Wins the worldcup,Sport

实现这一目标的另一种方法,也许更快:

df = pd.DataFrame({'data':['SPY', 'SPY','SPY', 'SPY', 'SPY', 'SPY','SPY'],
                   'created_at': ['2021-07-30', '2021-07-31', '2021-08-01', '2021-08-02', '2010-05-06', '2021-08-03', '2021-08-04']
                
                  })

df = df.assign(created_at=pd.to_datetime(df['created_at']))

df.query("created_at > '2021-07-30' & created_at < '2021-08-03'")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM