简体   繁体   中英

Pandas indexing and Key error

Consider the following:

d = {'a': 0.0, 'b': 1.0, 'c': 2.0}

e = pd.Series(d, index = ['a', 'b', 'c'])

df = pd.DataFrame({ 'A' : 1.,'B' : e,'C' :pd.Timestamp('20130102')}).

When i try to access the first row of column B in the following way:

>>> df.B[0]
0.0

I get the correct result.

However, after reading KeyError: 0 when accessing value in pandas series , I was under the assumption that, since I have specified the index as 'a', 'b' and 'c', the correct way to access the first row of column B (using positional arguments) is: df.B.iloc[0] , and df.B[0] should raise a Key Error. I dont know what am I missing. Can someone clarify in which case do I get a Key Error ?

Problem in your referenced Question is that index of given dataframe is integer, but does not start from 0.

Pandas behaviour when asking for df.B[0] is ambiguous and depends on data type of index and data type of value passed to python slice syntax. It can behave like df.B.loc[0] (index label based) or df.B.iloc[0] (position based) or probably something else I'm not aware of. For predictable behaviour I recommend using loc and iloc .

To illustrate this with your example:

d = [0.0, 1.0, 2.0]
e = pd.Series(d, index = ['a', 'b', 'c'])
df = pd.DataFrame({'A': 1., 'B': e, 'C': pd.Timestamp('20130102')})

df.B[0] # 0.0 - fall back to position based
df.B['0'] # KeyError - no label '0' in index
df.B['a'] # 0.0 - found label 'a' in index
df.B.loc[0] # TypeError - string index queried by integer value
df.B.loc['0'] # KeyError - no label '0' in index
df.B.loc['a'] # 0.0 - found label 'a' in index
df.B.iloc[0] # 0.0 - position based query for row 0
df.B.iloc['0'] # TypeError - string can't be used for position
df.B.iloc['a'] # TypeError - string can't be used for position

With example from referenced article:

d = [0.0, 1.0, 2.0]
e = pd.Series(d, index = [4, 5, 6])
df = pd.DataFrame({'A': 1., 'B': e, 'C': pd.Timestamp('20130102')})

df.B[0] # KeyError - label 0 not in index
df.B['0'] # KeyError - label '0' not in index
df.B.loc[0] # KeyError - label 0 not in index
df.B.loc['0'] # KeyError - label '0' not in index
df.B.iloc[0] # 0.0 - position based query for row 0
df.B.iloc['0'] # TypeError - string can't be used for position

df.B returns a pandas series which is why you can do positional indexing. If you select column B as a dataframe this will throw an error:

df[['B']][0]

df.B is actually a pandas.Series object (a shortcut for df['B'] ), which can be iterated. df.B[0] is no longer a "row" but just the first element of df.B , since by writing df.B you basically create a 1-D object.

More information in the data structure documentation

You can treat a DataFrame semantically like a dict of like-indexed Series objects.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM