[英]Column extracted from DataFrame has a different index
I'm encountering the following situation: 我遇到以下情况:
some_df.index #=> Int64Index([0, 1], dtype='int64')
some_df['some_column'].index #=> Float64Index([7.0, 5.0], dtype='object')
Why is this happening? 为什么会这样呢? Does this mean there was something wrong in the way
some_df
was constructed? 这是否意味着
some_df
构造方式有问题? Finally, what's the best way to ensure that columns I extract from some_df
all use the same index as some_df
itself? 最后,确保我从
some_df
提取的列都使用与some_df
本身相同的索引的最佳方法是什么?
EDIT: I dove deeper into the code and apparently there's a line that simply reassigns the index: some_df['some_column].index = some_df['another_column']
. 编辑:我深入研究了代码,并且显然有一行代码可以简单地重新分配索引:
some_df['some_column].index = some_df['another_column']
。 How broken is this? 这有多坏?
It's unclear whether this is a bug, though perhaps assigning to the Series index should raise (it may be quite tricky to get this behaviour)... You should definitely not be doing this! 尚不清楚这是否是一个错误,尽管分配给Series索引可能会提高(要获得这种行为可能非常棘手)……您绝对不应该这样做!
To confirm that this is indeed the case: 要确认确实如此:
In [11]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
In [12]: df
Out[12]:
A B
0 1 2
1 3 4
In [13]: df['A'].index
Out[13]: Int64Index([0, 1], dtype='int64')
In [14]: df['A'].index = [7., 8.]
In [15]: df['A'].index
Out[15]: Float64Index([7.0, 8.0], dtype='float64')
In [16]: df
Out[16]:
A B
0 1 2
1 3 4
So whilst this is apparently valid, you're going to get some surprising (potentially undefined) behaviour... 因此,尽管这显然是有效的,但是您将获得一些令人惊讶的(可能是未定义的)行为...
For example: 例如:
In [21]: df.groupby("A").sum()
Out[21]:
Empty DataFrame
Columns: [B]
Index: []
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.