简体   繁体   English

如何获取pandas DataFrame的最后N行?

[英]How to get the last N rows of a pandas DataFrame?

I have pandas dataframe df1 and df2 (df1 is vanila dataframe, df2 is indexed by 'STK_ID' & 'RPT_Date') : 我有pandas数据帧df1df2 (df1是vanila数据帧,df2由'STK_ID'和'RPT_Date'索引):

>>> df1
    STK_ID  RPT_Date  TClose   sales  discount
0   000568  20060331    3.69   5.975       NaN
1   000568  20060630    9.14  10.143       NaN
2   000568  20060930    9.49  13.854       NaN
3   000568  20061231   15.84  19.262       NaN
4   000568  20070331   17.00   6.803       NaN
5   000568  20070630   26.31  12.940       NaN
6   000568  20070930   39.12  19.977       NaN
7   000568  20071231   45.94  29.269       NaN
8   000568  20080331   38.75  12.668       NaN
9   000568  20080630   30.09  21.102       NaN
10  000568  20080930   26.00  30.769       NaN

>>> df2
                 TClose   sales  discount  net_sales    cogs
STK_ID RPT_Date                                             
000568 20060331    3.69   5.975       NaN      5.975   2.591
       20060630    9.14  10.143       NaN     10.143   4.363
       20060930    9.49  13.854       NaN     13.854   5.901
       20061231   15.84  19.262       NaN     19.262   8.407
       20070331   17.00   6.803       NaN      6.803   2.815
       20070630   26.31  12.940       NaN     12.940   5.418
       20070930   39.12  19.977       NaN     19.977   8.452
       20071231   45.94  29.269       NaN     29.269  12.606
       20080331   38.75  12.668       NaN     12.668   3.958
       20080630   30.09  21.102       NaN     21.102   7.431

I can get the last 3 rows of df2 by: 我可以通过以下方式获得最后3行df2:

>>> df2.ix[-3:]
                 TClose   sales  discount  net_sales    cogs
STK_ID RPT_Date                                             
000568 20071231   45.94  29.269       NaN     29.269  12.606
       20080331   38.75  12.668       NaN     12.668   3.958
       20080630   30.09  21.102       NaN     21.102   7.431

while df1.ix[-3:] give all the rows: df1.ix[-3:]给出所有行:

>>> df1.ix[-3:]
    STK_ID  RPT_Date  TClose   sales  discount
0   000568  20060331    3.69   5.975       NaN
1   000568  20060630    9.14  10.143       NaN
2   000568  20060930    9.49  13.854       NaN
3   000568  20061231   15.84  19.262       NaN
4   000568  20070331   17.00   6.803       NaN
5   000568  20070630   26.31  12.940       NaN
6   000568  20070930   39.12  19.977       NaN
7   000568  20071231   45.94  29.269       NaN
8   000568  20080331   38.75  12.668       NaN
9   000568  20080630   30.09  21.102       NaN
10  000568  20080930   26.00  30.769       NaN

Why ? 为什么? How to get the last 3 rows of df1 (dataframe without index) ? 如何获取最后3行df1 (没有索引的数据帧)? Pandas 0.10.1 熊猫0.10.1

Don't forget DataFrame.tail ! 别忘了DataFrame.tail eg df1.tail(10) 例如df1.tail(10)

This is because of using integer indices ( ix selects those by label over -3 rather than position , and this is by design: see integer indexing in pandas "gotchas" *). 这是因为使用整数索引( ix选择标签超过-3而不是位置 ,这是设计:请参阅pandas中的整数索引“gotchas” *)。

*In newer versions of pandas prefer loc or iloc to remove the ambiguity of ix as position or label: *在较新版本的pandas中,喜欢使用loc或iloc来消除ix作为位置或标签的歧义:

df.iloc[-3:]

see the docs . 文档

As Wes points out, in this specific case you should just use tail! 正如韦斯所指出的,在这种特殊情况下你应该只使用尾巴!

How to get the last N rows of a pandas DataFrame? 如何获取pandas DataFrame的最后N行?

If you are slicing by position, __getitem__ (ie, slicing with [] ) works well, and is the most succinct solution I've found for this problem. 如果按位置切片, __getitem__ (即用[]切片)效果很好,并且是我发现的最简洁的解决方案。

pd.__version__
# '0.24.2'

df = pd.DataFrame({'A': list('aaabbbbc'), 'B': np.arange(1, 9)})
df

   A  B
0  a  1
1  a  2
2  a  3
3  b  4
4  b  5
5  b  6
6  b  7
7  c  8

df[-3:]

   A  B
5  b  6
6  b  7
7  c  8

This is the same as calling df.iloc[-3:] , for instance ( iloc internally delegates to __getitem__ ). 这与调用df.iloc[-3:]相同(例如, iloc内部委托给__getitem__ )。


As an aside, if you want to find the last N rows for each group, use groupby and GroupBy.tail : 另外,如果要查找每个组的最后N行,请使用groupbyGroupBy.tail

df.groupby('A').tail(2)

   A  B
1  a  2
2  a  3
5  b  6
6  b  7
7  c  8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM