简体   繁体   English

read_pickle 随机失败

[英]read_pickle failing stochastically

I have a dataframe that I saved to a pickle file.我有一个 dataframe 保存到一个泡菜文件中。 When I load it with read_pickle it fails with the following error on roughly 1/10th of runs:当我用read_pickle加载它时,它在大约 1/10 的运行中失败并出现以下错误:

ValueError: Level values must be unique: [Timestamp('2020-06-03 15:59:59.999999+0000', tz='UTC'), datetime.date(2020, 6, 3), datetime.date(2020, 6, 4), datetime.date(2020, 6, 5)] on level 0

What is causing this stochastic behaviour?是什么导致了这种随机行为?

The issue can be reproduced with the following:该问题可以通过以下方式重现:

from datetime import timedelta, date
import pandas as pd
import pytz
from pandas import Timestamp

utc = pytz.UTC

data = {
    "date": [
        Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).replace(minute=59, second=59, microsecond=999999),
        Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date(),
        Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date(),
        Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date() + timedelta(days=1),
        Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date() + timedelta(days=1),
        Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date() + timedelta(days=2),
        Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date() + timedelta(days=2),
    ],
    "status": ["in_progress", "in_progress", "done", "in_progress", "done", "in_progress", "done"],
    "issue_count": [20, 18, 2, 14, 6, 10, 10],
    "points": [100, 90, 10, 70, 30, 50, 50],
    "stories": [0, 0, 0, 0, 0, 0, 0],
    "tasks": [100, 100, 100, 100, 100, 100, 100],
    "bugs": [0, 0, 0, 0, 0, 0, 0],
    "subtasks": [0, 0, 0, 0, 0, 0, 0],
    "assignee": ["Name", "Name", "Name", "Name", "Name", "Name", "Name"],
}
df = pd.DataFrame(data).groupby(["date", "status"]).sum()

df.to_pickle("~/failing_df.pkl")
pd.read_pickle("~/failing_df.pkl")

try to_csv() or to_dict()尝试to_csv()to_dict()

# write it to csv
df.to_csv('temp.csv')
# read it from csv
df2 = pd.read_csv('temp.csv')
df2.set_index(['date', 'status'], inplace=True)

or optionally或可选

df_dict = df.to_dict()

# pickle it
df.to_pickle('temp.pkl')

# unpickle it   
df2 = pd.read_pickle('temp.pkl')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM