简体   繁体   中英

What is the problem with the pandas to csv in my code?

I am running this code for a project I am doing for fun to find patterns in Disneyland wait times:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df_pirates_all = pd.read_csv(
    "https://cdn.touringplans.com/datasets/pirates_of_caribbean_dlr.csv",usecols=['date','datetime','SPOSTMIN'],
    parse_dates=['date', 'datetime'], 
)
df_pirates_all['ride'] = 'pirates'
df_pirates_all['open'] = ~((df_pirates_all['SPOSTMIN'] == -999))

df_pirates = df_pirates_all.set_index('datetime').sort_index()
df_pirates = df_pirates.loc['2017-01-01 06:00':'2017-02-01 00:00']
df_pirates = df_pirates.resample('15Min').ffill()


df_star_tours_all = pd.read_csv(
    "https://cdn.touringplans.com/datasets/star_tours_dlr.csv", usecols=['date','datetime','SPOSTMIN'],
    parse_dates=['date', 'datetime']
)
df_star_tours_all['ride'] = 'star_tours'
df_star_tours_all['open'] = ~((df_star_tours_all['SPOSTMIN'] == -999))

df_star_tours = df_star_tours_all.set_index('datetime').sort_index()
df_star_tours = df_star_tours.loc['2017-01-01 06:00':'2017-02-01 00:00']
df_star_tours = df_star_tours.resample('15Min').ffill()

df_space_all = pd.read_csv(
    "https://cdn.touringplans.com/datasets/space_mountain_dlr.csv", usecols=['date','datetime','SPOSTMIN'], 
    parse_dates=['date', 'datetime']
)
df_space_all['ride'] = 'space'
df_space_all['open'] = ~((df_space_all['SPOSTMIN'] == -999))

df_space = df_space_all.set_index('datetime').sort_index()
df_space = df_space.loc['2017-01-01 06:00':'2017-02-01 00:00']
df_space = df_space.resample('15Min').ffill()


all_data = pd.concat([df_pirates, df_star_tours, df_space]).reset_index()
all_data = (
    all_data
        # Drop any "NaN" values in the column 'ride'
        .dropna(subset=['ride', ])
        # Make datetime and ride a "Multi-Index"
        .set_index(['datetime', 'ride'])
        # Choose the column 'SPOSTMIN'
        ['SPOSTMIN']
        # Take the last index ('ride') and rotate to become column names
        .unstack()
)
# print (all_data)

for month, group in all_data.groupby(pd.Grouper(freq='M')):
    with pd.ExcelWriter(f'{month}.xlsx') as writer:
        for day, dfsub in group.groupby(pd.Grouper(freq='D')):
            dfsub.to_excel(writer, sheet_name='day')

However I am running into this error

FileCreateError: [Errno 22] Invalid argument: '2017-01-31 00:00:00.xlsx'

and it is connected to the dfsub.to_excel line.

It mostly got fixed by the comments, however, only one sheet is appearing and it only has the last day of data (1-31-17) instead of individual sheets for 1-1-17,1-2-17,etc.

For the first error based on the code you don't care about the specific date + time so do this:

with pd.ExcelWriter(f'{month.date()}.xlsx'):

This will convert the datetime object to a date object

Your second error is saying you are attempting to make a column that isn't all unique an index which pandas won't allow.

Maybe there are field you can combine or use another one?

What got it fixed was changing the code from

for month, group in all_data.groupby(pd.Grouper(freq='M')):
    with pd.ExcelWriter(f'{month}.xlsx') as writer:
        for day, dfsub in group.groupby(pd.Grouper(freq='D')):
            dfsub.to_excel(writer, sheet_name='day')

to

for month, group in all_data.groupby(pd.Grouper(freq='M')):
    with pd.ExcelWriter(f'{month.strftime("%B %Y")}.xlsx') as writer:
        for day, dfsub in group.groupby(pd.Grouper(freq='D')):
            dfsub.to_excel(writer,sheet_name=str(day.date()))

with the suggestions that were made.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM