简体   繁体   English

如何从 Python 中的 csv 文件中按升序提取日期

[英]How to extract dates in ascending order from a csv file in Python

I created a script to extract dates and prices from a CSV file and print it in another CSV file, but the issue is the dates in the raw CSV file are arranged randomly.我创建了一个脚本来从 CSV 文件中提取日期和价格并将其打印到另一个 CSV 文件中,但问题是原始 CSV 文件中的日期是随机排列的。 How do I extract the date to return in ascending order in my csv file instead of randomly?如何在 csv 文件中提取日期以升序返回,而不是随机提取? I'd like to do this in Python, you download the entire CSV file here: http://www.eia.gov/dnav/ng/hist/rngwhhdm.htm .我想在 Python 中执行此操作,您可以在此处下载整个 CSV 文件: http://www.eia.gov/dnav/nghtmhist/rhdm/

Sample CSV data below.示例 CSV 数据如下。 The dates continue till 1997 and they are sorted in reverse already.日期一直持续到 1997 年,并且它们已经反向排序。

Henry Hub Natural Gas Spot Price
https://www.eia.gov/dnav/ng/hist/rngwhhdm.htm
14:35:57 GMT-0800 (Pacific Standard Time)
Data Source: Thomson Reuters
Month,Henry Hub Natural Gas Spot Price Dollars per Million Btu
Jan 2021,2.71
Dec 2020,2.59
Nov 2020,2.61
Oct 2020,2.39
Sep 2020,1.92
Aug 2020,2.3
Jul 2020,1.77
Jun 2020,1.63
May 2020,1.75
Apr 2020,1.74
Mar 2020,1.79
Feb 2020,1.91
Jan 2020,2.02
Dec 2019,2.22
Nov 2019,2.65
Oct 2019,2.33
Sep 2019,2.56
Aug 2019,2.22
Jul 2019,2.37
Jun 2019,2.4
May 2019,2.64
Apr 2019,2.65
Mar 2019,2.95
Feb 2019,2.69
Jan 2019,3.11
Dec 2018,4.04
Nov 2018,4.09
Oct 2018,3.28
Sep 2018,3

The code already written is shown below:已经编写好的代码如下图所示:

import pandas as pd 

#Basic day granularity of .csv file (day)

dfg = pd.read_csv('Henry_Hub_Natural_Gas_Spot_Price.csv', skiprows=4)
dfg.index = pd.to_datetime(dfg["Day"],format='%m/%d/%Y')
dfg.to_csv('gas-details_day.csv', index=False) 

#Other granularities and sections of the .csv file (month)

dfg_month = dfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('M').sum()
df = pd.DataFrame(dfg_month, index=dfg_month.index.strftime("%m"))
df.to_csv('gas-details_month.csv', index=True)

#Other granularities and sections of the .csv file (year)
dfg_year = dfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('Y').sum()
df = pd.DataFrame(dfg_year, index=dfg_year.index.strftime("%Y"))
df.to_csv('gas-details_year.csv', index=True)

print("Extracted Successfully")

You can indicate to read_csv to parse the dates and then sort by index:您可以指示read_csv解析日期,然后按索引排序:

import pandas as pd 
df = pd.read_csv('Henry_Hub_Natural_Gas_Spot_Price.csv',
                 index_col='Month',parse_dates=True,infer_datetime_format=True,skiprows=4)
df.sort_index(inplace=True)
print(df)
            Henry Hub Natural Gas Spot Price Dollars per Million Btu
Month                                                               
1997-01-01                                               3.45       
1997-02-01                                               2.15       
1997-03-01                                               1.89       
1997-04-01                                               2.03       
1997-05-01                                               2.25       
...                                                       ...       
2020-09-01                                               1.92       
2020-10-01                                               2.39       
2020-11-01                                               2.61       
2020-12-01                                               2.59       
2021-01-01                                               2.71       

[289 rows x 1 columns]

You can use the dataframe.sort_index() method as shown below:您可以使用 dataframe.sort_index() 方法,如下所示:

import pandas as pd 

#Basic day granularity of .csv file (day)

dfg = pd.read_csv('Henry_Hub_Natural_Gas_Spot_Price.csv', skiprows=4)
dfg.index = pd.to_datetime(dfg["Day"],format='%m/%d/%Y')
dfg.sort_index()
dfg.to_csv('gas-details_day.csv', index=False) 

#Other granularities and sections of the .csv file (month)

dfg_month = dfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('M').sum()
df = pd.DataFrame(dfg_month, index=dfg_month.index.strftime("%m"))
df.sort_index()
df.to_csv('gas-details_month.csv', index=True)

#Other granularities and sections of the .csv file (year)
dfg_year = dfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('Y').sum()
df = pd.DataFrame(dfg_year, index=dfg_year.index.strftime("%Y"))
df.sort_index()
df.to_csv('gas-details_year.csv', index=True)

print("Exctracted Succesfully")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM