简体   繁体   English

遍历熊猫中的选定列和行

[英]Iterating through selected columns and rows in Pandas

I have just begun using Pandas for my study and I am facing an issue with the following step. 我刚开始使用Pandas进行学习,并且在后续步骤中遇到了问题。

Suppose I have a Dataframe with 'n' columns and 'm' rows. 假设我有一个带有'n'列和'm'行的数据框。

I want to iterate on a column indexed #2 and from row #5 omitting the preceding rows. 我想在索引为#2的列上进行迭代,并从第5行开始省略前几行。 How do I go about it? 我该怎么办?

I could select either the required rows are columns separately but am unable to do both in one step. 我可以分别选择所需的行或列,但是无法在一个步骤中同时完成这两个操作。 Can someone help me with an idea here? 有人可以在这里帮我一个主意吗?

The loc and iloc methods are great for that. loc和iloc方法非常有用。

If you only whant the 5th row and 2nd column : 如果您只想第五行和第二列:

df.iloc[5,2]

If you whant all rows starting with the 5th and all columns starting with the 2nd 如果您想从第五行开始的所有行和第二行开始的所有列

df.iloc[5:,2:]

Don't forget to assing your change to the df like that : 不要忘记像这样向df进行更改:

df = df.iloc[5:,2:]

Using pd.DataFrame.iloc , you can use integer indexers to isolate part of your dataframe. 使用pd.DataFrame.iloc ,可以使用整数索引器隔离数据pd.DataFrame.iloc一部分。 Given a dataframe df : 给定一个数据框df

res = df.iloc[5:, 2]

Note that indexing in Python begins with 0, so this is the 6th row onwards (or index 5 onwards). 请注意,Python中的索引从0开始,因此这是第6行(或第5行)。 Similarly, 2 represents the 3rd row (or column with index 2). 同样,2代表第三行(或索引为2的列)。 The indexing syntax is similar to Python list or NumPy array indexing. 索引语法类似于Python列表或NumPy数组索引。

Since we specify only one column index, the output will be a pd.Series object, which can be seen as a column. 由于我们仅指定一个列索引,因此输出将是pd.Series对象,可以将其视为一列。 If you specified multiple column indices, your output will be another dataframe. 如果指定了多个列索引,则输出将是另一个数据框。

In general, iteration isn't the best option with Pandas. 通常,对于熊猫来说,迭代并不是最好的选择。 You should aim to use vectorised operations. 您应该以使用矢量化操作为目标。 There are plenty of examples in the Pandas docs demonstrating vectorised calculations. Pandas文档中有许多示例说明了矢量化计算。


If your column index consists of strings, you can use get_loc to extract an integer location given a name: 如果列索引由字符串组成,则可以使用get_loc提取给定名称的整数位置:

res = df.iloc[5:, df.columns.get_loc('some_name')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM