在Python中遍历数据框的更优雅的方法

Question

For one iterable, we can loop through using 对于一个迭代，我们可以遍历使用

for item in items:

But what if I have two iterables side by side, think about a pandas dataframe with 2 columns for example. 但是，如果我并排有两个可迭代对象，例如考虑一个具有2列的pandas数据框。 I can use the above approach to loop through one column, but is there a more elegant way to loop through both columns at the same time? 我可以使用上述方法遍历一列，但是还有一种更优雅的方法可以同时遍历两列吗？

import pandas as pd
df = pd.DataFrame({'col 1': [1,2,3,4,5], 'col 2': [6,7,8,9,10]})
i = 0
for j in df['col 1']:
    print(j)
    print(df['col 2'][i])
    i += 1

Thanks! 谢谢！

Answer 1

the zip built-in function creates iterables that aggregates whatever you pass as parameters, so this should be an alternative: zip内置函数创建可迭代的对象，这些可迭代对象将您传递的所有参数作为参数进行汇总，因此这可以作为替代方案：

import pandas as pd
df = pd.DataFrame({'col 1': [1,2,3,4,5], 'col 2': [6,7,8,9,10]})
for i,j in zip(df['col 1'], df['col 2']):
    print(i)
    print(j)

Output: 输出：

Answer 2

You can iterate through entire rows which is more elegant: 您可以遍历整行，这更加优雅：

for index, row in df.iterrows():
    print(row['col 1'], row['col 2'])

Answer 3

Use the DataFrame.itertuples() method to loop through both columns at the same time: 使用DataFrame.itertuples（）方法可同时遍历两列：

for i, j in df[['col 1', 'col 2']].itertuples(index=False):
    print(i)
    print(j)

Answer 4

You've already gotten some great answers to your question. 您已经获得了一些很好的答案。 However, I would also like to provide you with a different approach altogether which could be even more elegant (depending on what your end goal is). 但是，我也想为您提供一种完全不同的方法，该方法可能更加优雅（取决于您的最终目标是什么）。

As a general rule of thumb, you want to avoid looping through the rows of a dataframe. 作为一般经验法则，您要避免循环遍历数据框的行。 That tends to be slow and there's usually a better way. 这往往很慢，通常有更好的方法。 Try to shift your thinking into applying a function to entire "vector" (fancy word for dataframe column). 尝试将您的思想转变为将函数应用于整个“向量”（数据框列的花式单词）。

Check this out: 看一下这个：

import pandas as pd
import numpy as np

df = pd.DataFrame({'col 1': [1,2,3,4,5], 'col 2': [6,7,8,9,10]})

def sum_2_cols(col1,col2):
    return col1 + col2

df['new_col'] = np.vectorize(sum_2_cols)(df['col 1'], df['col 2'])

The np.vectorize method is very powerful, flexible, and fast. np.vectorize方法非常强大，灵活且快速。 It allows you to apply your own functions to a dataframe and it tends to perform very well. 它允许您将自己的功能应用于数据框，并且往往表现得很好。 Try it out, you might get inspired to go about solving your problem in a different way. 尝试一下，您可能会得到启发，以其他方式解决问题。

在Python中遍历数据框的更优雅的方法

问题描述

4 个解决方案

解决方案1
1 2019-02-14 19:10:02

解决方案2
1 2019-02-14 19:13:20

解决方案3
0 2019-02-14 19:03:30

解决方案4
0 2019-02-14 19:27:58

在Python中遍历数据框的更优雅的方法

问题描述

4 个解决方案

解决方案1 1 2019-02-14 19:10:02

解决方案2 1 2019-02-14 19:13:20

解决方案3 0 2019-02-14 19:03:30

解决方案4 0 2019-02-14 19:27:58

解决方案1
1 2019-02-14 19:10:02

解决方案2
1 2019-02-14 19:13:20

解决方案3
0 2019-02-14 19:03:30

解决方案4
0 2019-02-14 19:27:58