从DataFrame提取的列具有不同的索引

Question

I'm encountering the following situation: 我遇到以下情况：

some_df.index                 #=> Int64Index([0, 1], dtype='int64')
some_df['some_column'].index  #=> Float64Index([7.0, 5.0], dtype='object')

Why is this happening? 为什么会这样呢？ Does this mean there was something wrong in the way some_df was constructed? 这是否意味着some_df构造方式有问题？ Finally, what's the best way to ensure that columns I extract from some_df all use the same index as some_df itself? 最后，确保我从some_df提取的列都使用与some_df本身相同的索引的最佳方法是什么？

EDIT: I dove deeper into the code and apparently there's a line that simply reassigns the index: some_df['some_column].index = some_df['another_column'] . 编辑：我深入研究了代码，并且显然有一行代码可以简单地重新分配索引： some_df['some_column].index = some_df['another_column'] 。 How broken is this? 这有多坏？

Answer 1

It's unclear whether this is a bug, though perhaps assigning to the Series index should raise (it may be quite tricky to get this behaviour)... You should definitely not be doing this! 尚不清楚这是否是一个错误，尽管分配给Series索引可能会提高（要获得这种行为可能非常棘手）……您绝对不应该这样做！

To confirm that this is indeed the case: 要确认确实如此：

In [11]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [12]: df
Out[12]:
   A  B
0  1  2
1  3  4

In [13]: df['A'].index
Out[13]: Int64Index([0, 1], dtype='int64')

In [14]: df['A'].index = [7., 8.]

In [15]: df['A'].index
Out[15]: Float64Index([7.0, 8.0], dtype='float64')

In [16]: df
Out[16]:
   A  B
0  1  2
1  3  4

So whilst this is apparently valid, you're going to get some surprising (potentially undefined) behaviour... 因此，尽管这显然是有效的，但是您将获得一些令人惊讶的（可能是未定义的）行为...

For example: 例如：

In [21]: df.groupby("A").sum()
Out[21]:
Empty DataFrame
Columns: [B]
Index: []

从DataFrame提取的列具有不同的索引

问题描述

1 个解决方案

解决方案1
0 2015-05-06 22:10:16

从DataFrame提取的列具有不同的索引

问题描述

1 个解决方案

解决方案1 0 2015-05-06 22:10:16

解决方案1
0 2015-05-06 22:10:16