如何在理解列表中迭代数据框的索引和列？

Question

If I need to iterate a df in a comprehension list, I would do this:如果我需要在理解列表中迭代df ，我会这样做：

df['new_col'] = [x if y == 1 and z == 2 for x,y in df[['col_1', 'col_2']].values]

If instead of iterating col_1 and col_2 , I need to iterate df.index and df_col_2 values?如果不是迭代col_1和col_2 ，我需要迭代df.index和df_col_2值？

What is the syntax inside the comprehension list in this for conditional example?这个for示例的理解列表中的语法是什么？

Answer 1

pandas dataframe and series have iteration methods. pandas dataframe 和系列有迭代方法。 So to iterate over index and a given column you can use iteritems:因此，要迭代索引和给定列，您可以使用 iteritems：

df['new_col'] = [x if y == '1' and z =='2' for x, y in df['col_2'].iteritems()]

In this case x is the index and y the value of column col2在这种情况下，x 是index ，y 是col2列的值

More generally iterrows gives you access to index and all columns in one iteration:更一般地， iterrows使您可以在一次迭代中访问索引和所有列：

for idx, row in df.iterrows():
    print("Index", idx)
    print("col1", row.col1)
    print("col2", row.col2)
    ...

Answer 2

Let's take an example of the dataframe.我们以 dataframe 为例。 Then look at various things you can do with df.loc and df.index.然后看看你可以用 df.loc 和 df.index 做的各种事情。

I will take a simple example of 5 kids ages 1 thru 4, and the points they have earned so far.我将举一个简单的例子，包括 5 个 1 到 4 岁的孩子，以及他们迄今为止获得的分数。 Normally, I would prefer unique indexes but for this example we want you want to search by index.通常，我更喜欢唯一索引，但对于这个示例，我们希望您希望按索引进行搜索。 So I am making it nonunique.所以我让它变得不独特。

import pandas as pd
df = pd.DataFrame({'Points': [2, 4, 8, 3, 2, 5, 6],
                   'Age': [2, 3, 4, 1, 2, 4, 3]},
                  index=['Bob', 'Mike', 'Steve', 'Kate', 'Jane', 'Jill', 'Jane'])
                  
print (df)

The DataFrame will look like this: DataFrame 将如下所示：

       Points  Age
Bob         2    2
Mike        4    3
Steve       8    4
Kate        3    1
Jane        2    2
Jill        5    4
Jane        6    3

If we want to find all the kids with Age = 2 and name = Jane, you can give:如果我们想找到 Age = 2 和 name = Jane 的所有孩子，你可以给出：

x = df.loc[(df.Age == 2) & (df.index == 'Jane'), 'Points'].tolist() #or .values

print (x)

Output will be: Output 将是：

[2]

If you want to find all the kids with Age = 2 and have scored 2 points, you can give:如果你想找到所有年龄为 2 且得分为 2 分的孩子，你可以给出：

y = df.index[(df.Age == 2) & (df.Points == 2)].tolist() #or .values

print (y)

Output will be: Output 将是：

['Bob', 'Jane']

If you want to find all the kids with Age = 3, you can give:如果你想找到所有年龄 = 3 的孩子，你可以给出：

z = df.index[(df.Age == 3)].tolist() #or .values

print (z)

Output will be: Output 将是：

['Mike', 'Jane']

Let's say you want to narrow this down and update Jane aged 2 with Found in New_col , then you can give:假设您想缩小范围并使用 Found in New_col更新 Jane 2 岁，那么您可以给出：

df.loc[(df.Age == 2) & (df.index == 'Jane'), 'New_col'] = 'Found'

print (df)

The outputs will be:输出将是：

       Points  Age New_col
Bob         2    2     NaN
Mike        4    3     NaN
Steve       8    4     NaN
Kate        3    1     NaN
Jane        2    2   Found
Jill        5    4     NaN
Jane        6    3     NaN

None of these required us to iterate through the dataframe using df.iterrows().这些都不需要我们使用 df.iterrows() 遍历 dataframe。 We can do a lot of these data manipulation without the use of df.iterrows().我们可以在不使用 df.iterrows() 的情况下进行很多此类数据操作。 If you have a specific usecase, let's review it and come up with a solution that does not involve iteration.如果您有特定的用例，让我们回顾一下并提出一个不涉及迭代的解决方案。

如何在理解列表中迭代数据框的索引和列？

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-03-22 15:28:08

解决方案2
0 2021-03-22 22:40:52

如何在理解列表中迭代数据框的索引和列？

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-03-22 15:28:08

解决方案2 0 2021-03-22 22:40:52

解决方案1
2 已采纳 2021-03-22 15:28:08

解决方案2
0 2021-03-22 22:40:52