简体   繁体   English

在 Pandas 中的 dataframe 中迭代行:使用 df.index 和 df.iterrows() 作为迭代器有区别吗?

[英]Iterating over rows in a dataframe in Pandas: is there a difference between using df.index and df.iterrows() as iterators?

When iterating through rows in a dataframe in Pandas, is there a difference in performance between using:在 Pandas 中的 dataframe 中迭代行时,使用以下性能是否存在差异:

for index in df.index:
    ....

And:和:

for index, row in df.iterrows():
    ....

? ? Which one should be preferred?应该首选哪一个?

When we doing for loop, look up index get the data require additional loc当我们做for循环时,查找索引获取数据需要额外的loc

for index in df.index:
    value = df.loc['index','col']

When we do df.iterrows当我们做df.iterrows

for index, row in df.iterrows():
    value = row['col']

Since you already with pandas, both of them are not recommended.由于您已经使用 pandas,因此不推荐使用这两种方法。 Unless you need certain function and cannot be vectorized.除非你需要某些 function 并且不能向量化。

However, IMO, I preferred df.index但是,IMO,我更喜欢df.index

Pandas is significantly faster for column-wise operations so consider transposing your dataset and carrying out whatever operation you want. Pandas 对于按列操作的速度要快得多,因此请考虑转置您的数据集并执行您想要的任何操作。 If you absolutely need to iterate through rows and want to keep it simple, you can use如果您绝对需要遍历行并希望保持简单,您可以使用

for row in df.itertuples():
    print(row.column_1)

df.itertuples is significantly faster than df.iterrows() and iterating over the indices. df.itertuples明显快于df.iterrows()并迭代索引。 However, there are faster ways to perform row-wise operations.但是,有更快的方法来执行逐行操作。 Check out this answer for an overview.查看答案以获取概述。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM