简体   繁体   English

从DataFrame提取的列具有不同的索引

[英]Column extracted from DataFrame has a different index

I'm encountering the following situation: 我遇到以下情况:

some_df.index                 #=> Int64Index([0, 1], dtype='int64')
some_df['some_column'].index  #=> Float64Index([7.0, 5.0], dtype='object')

Why is this happening? 为什么会这样呢? Does this mean there was something wrong in the way some_df was constructed? 这是否意味着some_df构造方式有问题? Finally, what's the best way to ensure that columns I extract from some_df all use the same index as some_df itself? 最后,确保我从some_df提取的列都使用与some_df本身相同的索引的最佳方法是什么?

EDIT: I dove deeper into the code and apparently there's a line that simply reassigns the index: some_df['some_column].index = some_df['another_column'] . 编辑:我深入研究了代码,并且显然有一行代码可以简单地重新分配索引: some_df['some_column].index = some_df['another_column'] How broken is this? 这有多坏?

It's unclear whether this is a bug, though perhaps assigning to the Series index should raise (it may be quite tricky to get this behaviour)... You should definitely not be doing this! 尚不清楚这是否是一个错误,尽管分配给Series索引可能会提高(要获得这种行为可能非常棘手)……您绝对不应该这样做!

To confirm that this is indeed the case: 要确认确实如此:

In [11]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [12]: df
Out[12]:
   A  B
0  1  2
1  3  4

In [13]: df['A'].index
Out[13]: Int64Index([0, 1], dtype='int64')

In [14]: df['A'].index = [7., 8.]

In [15]: df['A'].index
Out[15]: Float64Index([7.0, 8.0], dtype='float64')

In [16]: df
Out[16]:
   A  B
0  1  2
1  3  4

So whilst this is apparently valid, you're going to get some surprising (potentially undefined) behaviour... 因此,尽管这显然是有效的,但是您将获得一些令人惊讶的(可能是未定义的)行为...

For example: 例如:

In [21]: df.groupby("A").sum()
Out[21]:
Empty DataFrame
Columns: [B]
Index: []

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas DataFrame 列(系列)的索引与 Dataframe 不同? - Pandas DataFrame column (Series) has different index than the Dataframe? 从数据框列中提取的列表被识别为字符串 - list extracted from dataframe column recognized as string 用具有相同索引但顺序不同的另一列替换Pandas数据框中的一列 - Replace a column in Pandas dataframe with another that has same index but in a different order Pandas:将从 DataFrame 中提取的值乘以另一个 DataFrame 中的列值 - Pandas: Multiplying a value extracted from a DataFrame to column values in another DataFrame Dataframe 从 email 中提取,ValueError:无法使用多维键进行索引 - Dataframe extracted from email, ValueError: Cannot index with multidimensional key 根据从Pandas的json列中提取的值创建数据框 - Creating a dataframe from values extracted from a json column in Pandas 从Pandas数据框中的不同列创建索引列 - Creating an index column from different columns in a Pandas dataframe 如何从另一个具有不同日期时间索引的 dataframe 获取列值 - How to get column values from another dataframe with a different datetime index 使用从 Postgres 中提取的数据框在 Python 中创建计算列(使用 If 语句) - Creating a calculated column (with an If statement) in Python using a dataframe extracted from Postgres Python使用提取的正则表达式创建一个新列,直到\\n来自数据帧 - Python create a new column with extracted regex until \n from a dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM