[英]How to get the last N rows of a pandas DataFrame?
I have pandas dataframe df1
and df2
(df1 is vanila dataframe, df2 is indexed by 'STK_ID' & 'RPT_Date') : 我有pandas数据帧
df1
和df2
(df1是vanila数据帧,df2由'STK_ID'和'RPT_Date'索引):
>>> df1
STK_ID RPT_Date TClose sales discount
0 000568 20060331 3.69 5.975 NaN
1 000568 20060630 9.14 10.143 NaN
2 000568 20060930 9.49 13.854 NaN
3 000568 20061231 15.84 19.262 NaN
4 000568 20070331 17.00 6.803 NaN
5 000568 20070630 26.31 12.940 NaN
6 000568 20070930 39.12 19.977 NaN
7 000568 20071231 45.94 29.269 NaN
8 000568 20080331 38.75 12.668 NaN
9 000568 20080630 30.09 21.102 NaN
10 000568 20080930 26.00 30.769 NaN
>>> df2
TClose sales discount net_sales cogs
STK_ID RPT_Date
000568 20060331 3.69 5.975 NaN 5.975 2.591
20060630 9.14 10.143 NaN 10.143 4.363
20060930 9.49 13.854 NaN 13.854 5.901
20061231 15.84 19.262 NaN 19.262 8.407
20070331 17.00 6.803 NaN 6.803 2.815
20070630 26.31 12.940 NaN 12.940 5.418
20070930 39.12 19.977 NaN 19.977 8.452
20071231 45.94 29.269 NaN 29.269 12.606
20080331 38.75 12.668 NaN 12.668 3.958
20080630 30.09 21.102 NaN 21.102 7.431
I can get the last 3 rows of df2 by: 我可以通过以下方式获得最后3行df2:
>>> df2.ix[-3:]
TClose sales discount net_sales cogs
STK_ID RPT_Date
000568 20071231 45.94 29.269 NaN 29.269 12.606
20080331 38.75 12.668 NaN 12.668 3.958
20080630 30.09 21.102 NaN 21.102 7.431
while df1.ix[-3:]
give all the rows: 而
df1.ix[-3:]
给出所有行:
>>> df1.ix[-3:]
STK_ID RPT_Date TClose sales discount
0 000568 20060331 3.69 5.975 NaN
1 000568 20060630 9.14 10.143 NaN
2 000568 20060930 9.49 13.854 NaN
3 000568 20061231 15.84 19.262 NaN
4 000568 20070331 17.00 6.803 NaN
5 000568 20070630 26.31 12.940 NaN
6 000568 20070930 39.12 19.977 NaN
7 000568 20071231 45.94 29.269 NaN
8 000568 20080331 38.75 12.668 NaN
9 000568 20080630 30.09 21.102 NaN
10 000568 20080930 26.00 30.769 NaN
Why ? 为什么? How to get the last 3 rows of
df1
(dataframe without index) ? 如何获取最后3行
df1
(没有索引的数据帧)? Pandas 0.10.1 熊猫0.10.1
Don't forget DataFrame.tail
! 别忘了
DataFrame.tail
! eg df1.tail(10)
例如
df1.tail(10)
This is because of using integer indices ( ix
selects those by label over -3 rather than position , and this is by design: see integer indexing in pandas "gotchas" *). 这是因为使用整数索引(
ix
选择标签超过-3而不是位置 ,这是设计:请参阅pandas中的整数索引“gotchas” *)。
*In newer versions of pandas prefer loc or iloc to remove the ambiguity of ix as position or label: *在较新版本的pandas中,喜欢使用loc或iloc来消除ix作为位置或标签的歧义:
df.iloc[-3:]
As Wes points out, in this specific case you should just use tail! 正如韦斯所指出的,在这种特殊情况下你应该只使用尾巴!
How to get the last N rows of a pandas DataFrame?
如何获取pandas DataFrame的最后N行?
If you are slicing by position, __getitem__
(ie, slicing with []
) works well, and is the most succinct solution I've found for this problem. 如果按位置切片,
__getitem__
(即用[]
切片)效果很好,并且是我发现的最简洁的解决方案。
pd.__version__
# '0.24.2'
df = pd.DataFrame({'A': list('aaabbbbc'), 'B': np.arange(1, 9)})
df
A B
0 a 1
1 a 2
2 a 3
3 b 4
4 b 5
5 b 6
6 b 7
7 c 8
df[-3:]
A B
5 b 6
6 b 7
7 c 8
This is the same as calling df.iloc[-3:]
, for instance ( iloc
internally delegates to __getitem__
). 这与调用
df.iloc[-3:]
相同(例如, iloc
内部委托给__getitem__
)。
As an aside, if you want to find the last N rows for each group, use groupby
and GroupBy.tail
: 另外,如果要查找每个组的最后N行,请使用
groupby
和GroupBy.tail
:
df.groupby('A').tail(2)
A B
1 a 2
2 a 3
5 b 6
6 b 7
7 c 8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.