简体   繁体   中英

Pandas read_csv and remove daylight saving

I have a 312.5MB csv file containing EURUSD 1min OHLC data from 27/7/2003 to date, but the dates are all adjusted for daylight saving, meaning I get duplicates and gaps.

Seeing as it's such a big file the default date parser was way too slow, so I did this:

tizo = dateutil.tz.tzfile('/usr/share/zoneinfo/GB')
def date_parse_1min(s):
    return datetime(int(s[6:10]), 
                    int(s[3:5]), 
                    int(s[0:2]), 
                    int(s[11:13]),
                    int(s[14:16]),tzinfo=tizo)

df = read_csv("EURUSD_1m_clean_w_header.csv",index_col=0,parse_dates=True, date_parser=date_parse_1min)

#verify that it's got the tz right:
df.index
Exception AttributeError: "'NoneType' object has no attribute 'toordinal'" in 'pandas.tslib._localize_tso' ignored
Exception AttributeError: "'NoneType' object has no attribute 'toordinal'" in 'pandas.tslib._localize_tso' ignored
<class 'pandas.tseries.index.DatetimeIndex'>
[2003-07-26 23:00:00, ..., 2012-12-15 23:59:00]
Length: 4938660, Freq: None, Timezone: tzfile('/usr/share/zoneinfo/GB')

No idea why there are attribute errors there.

df.index.get_duplicates()
<class 'pandas.tseries.index.DatetimeIndex'>
[2003-10-26 01:00:00, ..., 2012-10-28 01:59:00]
Length: 600, Freq: None, Timezone: None
df1 = df.tz_convert('GMT')
df1.index.get_duplicates()
<class 'pandas.tseries.index.DatetimeIndex'>
[2003-10-26 01:00:00, ..., 2012-10-28 01:59:00]
Length: 600, Freq: None, Timezone: None

How can I get pandas to remove the daylight saving offset? Obviously I could work out the right integer indexes that need changing and do it like that, but there must be a better way.

If you take the first and last duplicate value of each year and shift the data in-between by an hour, that should be the easiest way of correcting the issue. You'll obviously have to take into account that the first data points start in daylight savings.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM