Consider the following:
d = {'a': 0.0, 'b': 1.0, 'c': 2.0}
e = pd.Series(d, index = ['a', 'b', 'c'])
df = pd.DataFrame({ 'A' : 1.,'B' : e,'C' :pd.Timestamp('20130102')}).
When i try to access the first row of column B in the following way:
>>> df.B[0]
0.0
I get the correct result.
However, after reading KeyError: 0 when accessing value in pandas series , I was under the assumption that, since I have specified the index as 'a', 'b' and 'c', the correct way to access the first row of column B (using positional arguments) is: df.B.iloc[0]
, and df.B[0]
should raise a Key Error. I dont know what am I missing. Can someone clarify in which case do I get a Key Error ?
Problem in your referenced Question is that index of given dataframe is integer, but does not start from 0.
Pandas behaviour when asking for df.B[0]
is ambiguous and depends on data type of index and data type of value passed to python slice syntax. It can behave like df.B.loc[0]
(index label based) or df.B.iloc[0]
(position based) or probably something else I'm not aware of. For predictable behaviour I recommend using loc
and iloc
.
To illustrate this with your example:
d = [0.0, 1.0, 2.0]
e = pd.Series(d, index = ['a', 'b', 'c'])
df = pd.DataFrame({'A': 1., 'B': e, 'C': pd.Timestamp('20130102')})
df.B[0] # 0.0 - fall back to position based
df.B['0'] # KeyError - no label '0' in index
df.B['a'] # 0.0 - found label 'a' in index
df.B.loc[0] # TypeError - string index queried by integer value
df.B.loc['0'] # KeyError - no label '0' in index
df.B.loc['a'] # 0.0 - found label 'a' in index
df.B.iloc[0] # 0.0 - position based query for row 0
df.B.iloc['0'] # TypeError - string can't be used for position
df.B.iloc['a'] # TypeError - string can't be used for position
With example from referenced article:
d = [0.0, 1.0, 2.0]
e = pd.Series(d, index = [4, 5, 6])
df = pd.DataFrame({'A': 1., 'B': e, 'C': pd.Timestamp('20130102')})
df.B[0] # KeyError - label 0 not in index
df.B['0'] # KeyError - label '0' not in index
df.B.loc[0] # KeyError - label 0 not in index
df.B.loc['0'] # KeyError - label '0' not in index
df.B.iloc[0] # 0.0 - position based query for row 0
df.B.iloc['0'] # TypeError - string can't be used for position
df.B
returns a pandas series which is why you can do positional indexing. If you select column B as a dataframe this will throw an error:
df[['B']][0]
df.B
is actually a pandas.Series
object (a shortcut for df['B']
), which can be iterated. df.B[0]
is no longer a "row" but just the first element of df.B
, since by writing df.B
you basically create a 1-D object.
More information in the data structure documentation
You can treat a DataFrame semantically like a dict of like-indexed Series objects.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.