Pandas，按最大返回 AssertionError 分组：

Question

熊猫有问题，我想听听你的意见，

我有这个数据框，我需要在其中获取最大值，代码就在下面，

df_stack=pd.DataFrame([[1.0, 2016.0, 'NonResidential', 'Hotel', 98101.0, 'DOWNTOWN',
        47.6122, -122.33799, 1927.0, 57.85220900338872,
        59.91269863912585],
       [1.0, 2016.0, 'NonResidential', 'Hotel', 98101.0, 'DOWNTOWN',
        47.61317, -122.33393, 1996.0, 55.82342114189166,
        56.86951201265458],
       [3.0, 2016.0, 'NonResidential', 'Hotel', 98101.0, 'DOWNTOWN',
        47.61393, -122.3381, 1969.0, 76.68191235628086,
        77.37931271575705],
       [5.0, 2016.0, 'NonResidential', 'Hotel', 98101.0, 'DOWNTOWN',
        47.61412, -122.33664, 1926.0, 68.53505428597694,
        71.00764283155655],
       [8.0, 2016.0, 'NonResidential', 'Hotel', 98121.0, 'DOWNTOWN',
        47.61375, -122.34047, 1980.0, 67.01346098859122,
        68.34485815906346]], columns=['OSEBuildingID', 'DataYear', 'BuildingType', 'PrimaryPropertyType', 
 'ZipCode', 'Neighborhood', 'Latitude', 'Longitude', 'YearBuilt', 
 'SourceEUI(KWm2)', 'SourceEUIWN(KWm2)' ])

当我运行下面的代码时：

df_stack[['OSEBuildingID', 
          'DataYear', 
          'BuildingType', 
          'PrimaryPropertyType', 
          'ZipCode', 'Neighborhood', 'Latitude', 'Longitude', 
          'YearBuilt', 'SourceEUI(KWm2)', 'SourceEUIWN(KWm2)']].groupby('OSEBuildingID').max()

我收到一个错误“AssertionError：”，如果您尝试这个，您可能会遇到同样的错误。 但是，当我评论这两列并再次运行代码时

df_stack[['OSEBuildingID', 
          'DataYear', 
          #'BuildingType', 
          #'PrimaryPropertyType', 
          'ZipCode', 'Neighborhood', 'Latitude', 'Longitude', 
          'YearBuilt', 'SourceEUI(KWm2)', 'SourceEUIWN(KWm2)']].groupby('OSEBuildingID').max()

我得到结果

     DataYear  ZipCode Neighborhood  Latitude  Longitude  YearBuilt  SourceEUI(KWm2)  SourceEUIWN(KWm2)
OSEBuildingID                                                                                                    
1.0              2016.0  98101.0     DOWNTOWN  47.61317 -122.33393     1996.0        57.852209          59.912699
3.0              2016.0  98101.0     DOWNTOWN  47.61393 -122.33810     1969.0        76.681912          77.379313
5.0              2016.0  98101.0     DOWNTOWN  47.61412 -122.33664     1926.0        68.535054          71.007643
8.0              2016.0  98121.0     DOWNTOWN  47.61375 -122.34047     1980.0        67.013461          68.344858

如果我将 max() 替换为 mean()，我可以取消注释这两行并毫无问题地运行代码。 这种行为只发生在 max() 和 min() 上，我只是测试 max、mean 和 min，但我需要得到最大值。

如果能帮上忙，谢谢。

Answer 1

这是1.0.0中的回归，用'1.0.1'修复，所以我建议你升级你的版本。

修复了 .groupby().agg() 中的回归，为一些减少（例如 object-dtype 列上的 min ）引发 AssertionError（GH31522）

Answer 2

Carlos Carvalho ，当我运行此代码时，我没有收到任何错误。 如果您将其复制并粘贴到终端中，您能否确认您仍然收到错误？ 正如上面评论中所暗示的，它可能与您的版本有关。 此外， BuildingType和PrimaryPropertyTypes是对象而不是浮点数，但它仍然可以工作：

df_stack=pd.DataFrame([[1.0, 2016.0, 'NonResidential', 'Hotel', 98101.0, 'DOWNTOWN',
        47.6122, -122.33799, 1927.0, 57.85220900338872,
        59.91269863912585],
       [1.0, 2016.0, 'NonResidential', 'Hotel', 98101.0, 'DOWNTOWN',
        47.61317, -122.33393, 1996.0, 55.82342114189166,
        56.86951201265458],
       [3.0, 2016.0, 'NonResidential', 'Hotel', 98101.0, 'DOWNTOWN',
        47.61393, -122.3381, 1969.0, 76.68191235628086,
        77.37931271575705],
       [5.0, 2016.0, 'NonResidential', 'Hotel', 98101.0, 'DOWNTOWN',
        47.61412, -122.33664, 1926.0, 68.53505428597694,
        71.00764283155655],
       [8.0, 2016.0, 'NonResidential', 'Hotel', 98121.0, 'DOWNTOWN',
        47.61375, -122.34047, 1980.0, 67.01346098859122,
        68.34485815906346]], columns=['OSEBuildingID', 'DataYear', 'BuildingType', 
                                      'PrimaryPropertyType', 
 'ZipCode', 'Neighborhood', 'Latitude', 'Longitude', 'YearBuilt', 
 'SourceEUI(KWm2)', 'SourceEUIWN(KWm2)' ])
df_stack[['OSEBuildingID', 'DataYear', 'BuildingType', 'PrimaryPropertyType', 
          'ZipCode', 'Neighborhood', 'Latitude', 'Longitude', 'YearBuilt', 
          'SourceEUI(KWm2)', 'SourceEUIWN(KWm2)']].groupby('OSEBuildingID').max()

Answer 3

我最近在使用 pandas 1.3.2 版时遇到了这个错误，发现问题出在有两个同名的列。 因此，对于具有col1, val1, val1列的数据框，调用df.groupby('col1').agg({'val1': np.min})引发此错误，因为有两列名为val1

Answer 4

我也有这个问题，但这是由于 datetime 列上的NaT pandas 值。 发生这种情况时，请务必在 datetime 列上使用fillna 。

我的熊猫版本是 1.3.2

Answer 5

这个问题也发生在 Pandas 1.1.2 中。 正如 Raphael Pavan 所提到的，问题似乎是在其中具有np.nan OR None值的列上使用max()或min() 。

使用.fillna()将None和NaN值替换为相关的内容（甚至是空字符串），然后使用 agg 函数。

Pandas，按最大返回 AssertionError 分组：

问题描述

5 个解决方案

解决方案1
2 已采纳 2020-03-09 21:57:25

解决方案2
0 2020-03-09 21:49:20

解决方案3
0 2021-09-28 18:25:55

解决方案4
0 2022-05-10 18:08:15

解决方案5
0 2022-07-11 22:00:42

Pandas，按最大返回 AssertionError 分组：

问题描述

5 个解决方案

解决方案1 2 已采纳 2020-03-09 21:57:25

解决方案2 0 2020-03-09 21:49:20

解决方案3 0 2021-09-28 18:25:55

解决方案4 0 2022-05-10 18:08:15

解决方案5 0 2022-07-11 22:00:42

解决方案1
2 已采纳 2020-03-09 21:57:25

解决方案2
0 2020-03-09 21:49:20

解决方案3
0 2021-09-28 18:25:55

解决方案4
0 2022-05-10 18:08:15

解决方案5
0 2022-07-11 22:00:42