簡體   English   中英

如何根據csv文件的python中的日期過濾掉數據

[英]How to filter out data based on date in python of a csv file

我有一個數據集如下&我想過濾從 2021-07-30 到 2021-08-03 的數據下面是數據集

輸入.csv

created_at,text,label
2021-07-24,Newzeland Wins the worldcup,Sport
2021-07-25,ABC Wins the worldcup,Sport
2021-07-26,Hello the worldcup,Sport
2021-07-27,Cricket worldcup,Sport
2021-07-28,Rugby worldcup,Sport
2021-07-29,LLL Wins,Sport
2021-07-30,MMM Wins the worldcup,Sport
2021-07-31,RRR Wins the worldcup,Sport
2021-08-01,OOO Wins the worldcup,Sport
2021-08-02,JJJ Wins the worldcup,Sport
2021-08-03,YYY Wins the worldcup,Sport
2021-08-04,KKK Wins the worldcup,Sport
2021-08-05,YYY Wins the worldcup,Sport
2021-08-06,GGG Wins the worldcup,Sport
2021-08-07,FFF Wins the worldcup,Sport
2021-08-08,SSS Wins the worldcup,Sport
2021-08-09,XYZ Wins the worldcup,Sport
2021-08-10,PQR Wins the worldcup,Sport

output.csv

created_at,text,label
2021-07-30,MMM Wins the worldcup,Sport
2021-07-31,RRR Wins the worldcup,Sport
2021-08-01,OOO Wins the worldcup,Sport
2021-08-02,JJJ Wins the worldcup,Sport
2021-08-03,YYY Wins the worldcup,Sport
import pandas as pd
def save():
    tweets = pd.read_csv(r'input.csv.csv')
    df = pd.DataFrame(tweets, columns=['created_at', 'text','label'])

if __name__ == '__main__':
    save()
df[(df.created_at >= '2021-07-30') & (df.created_at <= '2021-08-03')]

Output:

    created_at                   text  label
6   2021-07-30  MMM Wins the worldcup  Sport
7   2021-07-31  RRR Wins the worldcup  Sport
8   2021-08-01  OOO Wins the worldcup  Sport
9   2021-08-02  JJJ Wins the worldcup  Sport
10  2021-08-03  YYY Wins the worldcup  Sport

嘗試:

df = pd.read_csv('input.csv', parse_dates=['created_at'])
out = df[df['created_at'].between('2021-07-30', '2021-08-03')]
out.to_csv('output.csv', index=False)

output.csv的內容:

created_at,text,label
2021-07-30,MMM Wins the worldcup,Sport
2021-07-31,RRR Wins the worldcup,Sport
2021-08-01,OOO Wins the worldcup,Sport
2021-08-02,JJJ Wins the worldcup,Sport
2021-08-03,YYY Wins the worldcup,Sport

實現這一目標的另一種方法,也許更快:

df = pd.DataFrame({'data':['SPY', 'SPY','SPY', 'SPY', 'SPY', 'SPY','SPY'],
                   'created_at': ['2021-07-30', '2021-07-31', '2021-08-01', '2021-08-02', '2010-05-06', '2021-08-03', '2021-08-04']
                
                  })

df = df.assign(created_at=pd.to_datetime(df['created_at']))

df.query("created_at > '2021-07-30' & created_at < '2021-08-03'")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM