简体   繁体   English

从熊猫数据框中选择特定行

[英]Selecting specific rows from a pandas dataframe

I just want to know if there is any function in pandas that selects specific rows based on index from a dataframe without having to write your own function.我只想知道 Pandas 中是否有任何函数可以根据数据帧中的索引选择特定行,而无需编写自己的函数。

For example: selecting rows with index [15:50] from a large dataframe.例如:从大型数据框中选择索引为 [15:50] 的行。

I have written this function, but I would like to know if there is a shortcut.我已经写了这个函数,但是我想知道是否有快捷方式。

def split_concat(data , first , last):
    data_out = pd.DataFrame()
    for i in range(first, last +1):
        data_split = data.loc[i]
        data_out = pd.concat([data_out,data_split],axis = 0)

    return data_out

You could use either pandas.DataFrame.loc or pandas.DataFrame.iloc .您可以使用pandas.DataFrame.locpandas.DataFrame.iloc See examples below.请参阅下面的示例。

import pandas as pd

d = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
     {'a': 100, 'b': 200, 'c': 300, 'd': 400},
     {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 },
     {'a': 1500, 'b': 2500, 'c': 3500, 'd': 4500}]

df = pd.DataFrame(d)

print(df)               # Print original dataframe
print(df.loc[1:2])      # Print rows with index 1 and 2, (method 1)
print(df.iloc[1:3])     # Print rows with index 1 and 2, (method 2)

Original dataframe: print(df) will print:原始数据帧: print(df)将打印:

      a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000
3  1500  2500  3500  4500

And print(df.loc[1:2]) for index selection by label:print(df.loc[1:2])用于按标签进行索引选择:

      a     b     c     d
1   100   200   300   400
2  1000  2000  3000  4000

And print(df.iloc[1:3]) for row selection by integer.print(df.iloc[1:3])以按整数选择行。 As mentioned by ALollz, rows are treated as numbers from 0 to len(df) :正如 ALollz 所提到的,行被视为从 0 到len(df)

      a     b     c     d
1   100   200   300   400
2  1000  2000  3000  4000

A rule of thumb could be:一个经验法则可能是:

  • Use .loc when you want to refer to the actual value of the index, being a string or integer.当您想引用索引的实际值时,请使用.loc ,该值是字符串或整数。

  • Use .iloc when you want to refer to the underlying row number which always ranges from 0 to len(df) .当您想要引用始终范围从 0 到len(df)的基础行号时,请使用.iloc

Note that the end value of the slice in .loc is included.请注意,包含.loc切片的结束值。 This is not the case for .iloc , and for Python slices in general.对于.iloc和一般的 Python 切片,情况并非如此。

Pandas in general大熊猫一般

Pandas has 'easy' ways of doing all sorts of stuff like this. Pandas 有“简单”的方法来做各种各样的事情。 If you have a problem that you think is common for manipulation of tabular data, try searching for pandas ways of getting it done before inventing it yourself.如果您有一个您认为操作表格数据很常见的问题,请尝试在自己发明之前搜索 pandas 完成它的方法。 Pandas will almost always have a syntactically concise and computationally faster way of doing things than what we can write ourselves.与我们自己编写的相比,Pandas 几乎总是有一种语法简洁且计算速度更快的做事方式。

用这个:

rowData = your_df.loc[ 'index' , : ]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM