简体   繁体   中英

Pandas dataframe slice by index

I am trying to slice a dataframe with index but it is giving error as 'TypeError: 'Int64Index([1], dtype='int64')' is an invalid key'

data = [['Alex', 10], ['Bob', 12], ['Clarke', 13]]
df = pd.DataFrame(data, columns=['Name', 'Age'])
index = df.index[df['Name'] == 'Bob']
print(index)
df = df.loc[index:]

Error:

df = df.loc[index:]
File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 1500, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 1867, in _getitem_axis
return self._get_slice_axis(key, axis=axis)
File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 1533, in _get_slice_axis
slice_obj.step, kind=self.name)
File "C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 4672, in slice_indexer
kind=kind)
File "C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 4871, in slice_locs
start_slice = self.get_slice_bound(start, 'left', kind)
File "C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 4801, in get_slice_bound
slc = self._get_loc_only_exact_matches(label)
File "C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 4771, in _get_loc_only_exact_matches
return self.get_loc(key)
File "C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 2656, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 110, in pandas._libs.index.IndexEngine.get_loc
TypeError: 'Int64Index([1], dtype='int64')' is an invalid key

Printing the index is giving 'Int64Index([1], dtype='int64')' How can I convert it to int value.

No much documentation is available on https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.Int64Index.html

To do this, you need to make sure that your index variable contains just an integer, rather than some other object which may contain multiple values (if 'Bob' appears more than once). In this case it would only contain one value, since 'Bob' only appears once in your table, but what you get is an Int64Index object which is capable of holding several integers. What you want is just a plain old integer.

The following should work for your table, and for a table where Bob does indeed appear multiple times (it will select the index for the first row in which 'Bob' appears):

index = (df['Name'] == 'Bob').idxmax()

The idxmax function returns the index of the highest valued item in a series (and True is higher than False , so it returns the index where name is 'Bob'). In the case where there are two or more highest values, the first index is returned.

Try this if you want to get whole dataframe starting from this index:

df = df.loc[index[0]:]

If you are trying to get only the row by name try:

df = df[df['Name'] == 'Bob']

slight modification to your code

index = list(df.index[df['Name'] == 'Bob'])

should give you the postion. Let me know if it works

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM