简体   繁体   English

检查 pandas dataframe 是否被编入索引?

[英]checking if pandas dataframe is indexed?

Is it possible to check if a pandas dataframe is indexed?是否可以检查 pandas dataframe 是否已编入索引? Check if DataFrame.set_index(...) was ever called on the dataframe?检查DataFrame.set_index(...)是否曾在 dataframe 上调用过? I could check if df.index is a numeric list but that's not a perfect test for this.我可以检查df.index是否是一个数字列表,但这并不是一个完美的测试。

One way would be to compare it to the plain Index: 一种方法是将其与普通指数进行比较:

pd.Index(np.arange(0, len(df))).equals(df.index)

For example: 例如:

In [11]: df = pd.DataFrame([['a', 'b'], ['c', 'd']], columns=['A', 'B'])

In [12]: df
Out[12]:
   A  B
0  a  b
1  c  d

In [13]: pd.Index(np.arange(0, len(df))).equals(df.index)
Out[13]: True

and if it's not the plain index, it will return False: 如果它不是普通索引,它将返回False:

In [14]: df = df.set_index('A')

In [15]: pd.Index(np.arange(0, len(df))).equals(df.index)
Out[15]: False

I just ran into this myself.我自己也遇到过这个。 The problem is that a dataframe is indexed before calling .set_index() , so the question is really whether or not the index is named .问题是 dataframe调用.set_index()之前索引,所以问题实际上是索引是否被命名 In which case, df.index.name appears to be less reliable than df.index.names在这种情况下, df.index.name似乎不如df.index.names可靠

>>> import pandas as pd
>>> df = pd.DataFrame({"id1": [1, 2, 3], "id2": [4,5,6], "word": ["cat", "mouse", "game"]})
>>> df
   id1  id2   word
0    1    4    cat
1    2    5  mouse
2    3    6   game
>>> df.index
RangeIndex(start=0, stop=3, step=1)
>>> df.index.name, df.index.names[0]
(None, None)
>>> "indexed" if df.index.names[0] else "no index"
'no index'
>>> df1 = df.set_index("id1")
>>> df1
     id2   word
id1            
1      4    cat
2      5  mouse
3      6   game
>>> df1.index
>>> df1.index.name, df1.index.names[0]
('id1', 'id1')
Int64Index([1, 2, 3], dtype='int64', name='id1')
>>> "indexed" if df1.index.names[0] else "no index"
'indexed'
>>> df12 = df.set_index(["id1", "id2"])
>>> df12
          word
id1 id2       
1   4      cat
2   5    mouse
3   6     game
>>> df12.index
MultiIndex([(1, 4),
            (2, 5),
            (3, 6)],
           names=['id1', 'id2'])
>>> df12.index.name, df12.index.names[0]
(None, 'id1')
>>> "indexed" if df12.index.names[0] else "no index"
'indexed'

The following worked for me, I do set_index([label], append=False) if the dataframe has the default RangeIndex, or set_index([label], append=True) otherwise.以下对我有用,如果 dataframe 具有默认的 RangeIndex,我会执行 set_index([label], append=False),否则我会执行 set_index([label], append=True)。

append = not isinstance(df.index, pd.RangeIndex)
df.set_index([label], drop=True, append=append, inplace=True)

So my assumption, is that when index is the default RangeIndex, that setting another column as an index, I can drop the RangeIndex.所以我的假设是,当索引是默认的 RangeIndex 时,将另一列设置为索引,我可以删除 RangeIndex。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM