[英]How to filter out data based on date in python of a csv file
我有一個數據集如下&我想過濾從 2021-07-30 到 2021-08-03 的數據下面是數據集
輸入.csv
created_at,text,label
2021-07-24,Newzeland Wins the worldcup,Sport
2021-07-25,ABC Wins the worldcup,Sport
2021-07-26,Hello the worldcup,Sport
2021-07-27,Cricket worldcup,Sport
2021-07-28,Rugby worldcup,Sport
2021-07-29,LLL Wins,Sport
2021-07-30,MMM Wins the worldcup,Sport
2021-07-31,RRR Wins the worldcup,Sport
2021-08-01,OOO Wins the worldcup,Sport
2021-08-02,JJJ Wins the worldcup,Sport
2021-08-03,YYY Wins the worldcup,Sport
2021-08-04,KKK Wins the worldcup,Sport
2021-08-05,YYY Wins the worldcup,Sport
2021-08-06,GGG Wins the worldcup,Sport
2021-08-07,FFF Wins the worldcup,Sport
2021-08-08,SSS Wins the worldcup,Sport
2021-08-09,XYZ Wins the worldcup,Sport
2021-08-10,PQR Wins the worldcup,Sport
output.csv
created_at,text,label
2021-07-30,MMM Wins the worldcup,Sport
2021-07-31,RRR Wins the worldcup,Sport
2021-08-01,OOO Wins the worldcup,Sport
2021-08-02,JJJ Wins the worldcup,Sport
2021-08-03,YYY Wins the worldcup,Sport
import pandas as pd
def save():
tweets = pd.read_csv(r'input.csv.csv')
df = pd.DataFrame(tweets, columns=['created_at', 'text','label'])
if __name__ == '__main__':
save()
df[(df.created_at >= '2021-07-30') & (df.created_at <= '2021-08-03')]
Output:
created_at text label
6 2021-07-30 MMM Wins the worldcup Sport
7 2021-07-31 RRR Wins the worldcup Sport
8 2021-08-01 OOO Wins the worldcup Sport
9 2021-08-02 JJJ Wins the worldcup Sport
10 2021-08-03 YYY Wins the worldcup Sport
嘗試:
df = pd.read_csv('input.csv', parse_dates=['created_at'])
out = df[df['created_at'].between('2021-07-30', '2021-08-03')]
out.to_csv('output.csv', index=False)
output.csv
的內容:
created_at,text,label
2021-07-30,MMM Wins the worldcup,Sport
2021-07-31,RRR Wins the worldcup,Sport
2021-08-01,OOO Wins the worldcup,Sport
2021-08-02,JJJ Wins the worldcup,Sport
2021-08-03,YYY Wins the worldcup,Sport
實現這一目標的另一種方法,也許更快:
df = pd.DataFrame({'data':['SPY', 'SPY','SPY', 'SPY', 'SPY', 'SPY','SPY'],
'created_at': ['2021-07-30', '2021-07-31', '2021-08-01', '2021-08-02', '2010-05-06', '2021-08-03', '2021-08-04']
})
df = df.assign(created_at=pd.to_datetime(df['created_at']))
df.query("created_at > '2021-07-30' & created_at < '2021-08-03'")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.