pivot 表 pandas 错误 - 值应该是“Timedelta”、“NaT”或这些的数组。取而代之的是“int”

Question

while trying to pivot a table I get an error I don't understand how to fix.在尝试 pivot 表时出现错误，我不知道如何修复。

My code is:我的代码是：

import numpy as np
import pandas as pd

df1=pd.read_csv(r'C:\Users\Documents\Python\Data.csv')
df_com = df1.groupby(['CommentOwner','DiscussionId'])
y=df_com.nunique()
y=y.reset_index()
p=y.pivot(index="CommentOwner", columns="DiscussionId", values=['CommentOwner','DiscussionId','CommentCreation_min','CommentCreation_max','CommentCreation_count','AnswerId']).fillna(0)

I used reset_index() so I can use the columns 'CommentOwner','DiscussionId' after they were removed during the group by.我使用了 reset_index()，因此我可以在分组期间删除列 'CommentOwner'、'DiscussionId' 后使用它们。

when I run this code I get this mistake:当我运行这段代码时，我得到了这个错误：

TypeError: value should be a 'Timedelta', 'NaT', or array of those. Got 'int' instead.

when I try this code it does works:当我尝试这段代码时它确实有效：

import numpy as np
import pandas as pd

df1=pd.read_csv(r'C:\Users\Documents\Python\Data.csv')
df_com = df1.groupby(['CommentOwner','DiscussionId'])
y=df_com.nunique()
y.to_csv(r'C:\Users\Documents\Python\y.csv')
y_x=pd.read_csv(r'C:\Users\Documents\Python\y.csv')
p=y_x.pivot(index="CommentOwner", columns="DiscussionId", values=['CommentOwner','DiscussionId','CommentCreation_min','CommentCreation_max','CommentCreation_count','AnswerId']).fillna(0)

The code worked when I didn't use reset_index(), but rather saved the data frame as csv and then read it again.当我不使用 reset_index() 时代码有效，而是将数据帧保存为 csv 然后再次读取。

I hope my question is clear.我希望我的问题很清楚。 Any idea why this happens?知道为什么会这样吗？

There must be a nicer way to do it without saving the output and reuploading it.必须有更好的方法来做到这一点，而无需保存 output 并重新上传它。

Thanks!谢谢！

Answer 1

The problem is you fill null values with 0 for all columns even for datetime64 columns.问题是您为所有列甚至datetime64列都用 0 填充 null 值。

You should do something like:你应该这样做：

p = (y.pivot(...)
      .fillna({my_datetime_col1: pd.NaT, my_datetime_col2: pd.NaT})
      .fillna(0))

The first fillna replace null values for all DatetimeIndex columns then the second one replace other missing values.第一个fillna替换所有DatetimeIndex列的 null 值，然后第二个替换其他缺失值。

Answer 2

IIUC use: IIUC 使用：

df1=pd.read_csv(r'C:\Users\Documents\Python\Data.csv')
p = df1.groupby(['CommentOwner','DiscussionId']).nunique().unstack(fill_value=0)

Btw, your solution should working if remove 'CommentOwner','DiscussionId' from parameter value like:顺便说一句，如果从参数value中删除'CommentOwner','DiscussionId' ，您的解决方案应该可以工作，例如：

p=y.pivot(index="CommentOwner", 
          columns="DiscussionId", 
          values=['CommentCreation_min','CommentCreation_max',
                  'CommentCreation_count','AnswerId']).fillna(0)

pivot 表 pandas 错误 - 值应该是“Timedelta”、“NaT”或这些的数组。取而代之的是“int”

问题描述

2 个解决方案

解决方案1
2 已采纳 2023-01-17 07:21:14

解决方案2
0 2023-01-17 07:34:05

pivot 表 pandas 错误 - 值应该是“Timedelta”、“NaT”或这些的数组。 取而代之的是“int”

问题描述

2 个解决方案

解决方案1 2 已采纳 2023-01-17 07:21:14

解决方案2 0 2023-01-17 07:34:05

pivot 表 pandas 错误 - 值应该是“Timedelta”、“NaT”或这些的数组。取而代之的是“int”

解决方案1
2 已采纳 2023-01-17 07:21:14

解决方案2
0 2023-01-17 07:34:05