简体   繁体   English

如何在理解列表中迭代数据框的索引和列?

[英]How to iterate dataframe´s index and columns in comprehension list?

If I need to iterate a df in a comprehension list, I would do this:如果我需要在理解列表中迭代df ,我会这样做:

df['new_col'] = [x if y == 1 and z == 2 for x,y in df[['col_1', 'col_2']].values]

If instead of iterating col_1 and col_2 , I need to iterate df.index and df_col_2 values?如果不是迭代col_1col_2 ,我需要迭代df.indexdf_col_2值?

What is the syntax inside the comprehension list in this for conditional example?这个for示例的理解列表中的语法是什么?

pandas dataframe and series have iteration methods. pandas dataframe 和系列有迭代方法。 So to iterate over index and a given column you can use iteritems:因此,要迭代索引和给定列,您可以使用 iteritems:

df['new_col'] = [x if y == '1' and z =='2' for x, y in df['col_2'].iteritems()]

In this case x is the index and y the value of column col2在这种情况下,x 是index ,y 是col2列的值

More generally iterrows gives you access to index and all columns in one iteration:更一般地, iterrows使您可以在一次迭代中访问索引和所有列:

for idx, row in df.iterrows():
    print("Index", idx)
    print("col1", row.col1)
    print("col2", row.col2)
    ...

Let's take an example of the dataframe.我们以 dataframe 为例。 Then look at various things you can do with df.loc and df.index.然后看看你可以用 df.loc 和 df.index 做的各种事情。

I will take a simple example of 5 kids ages 1 thru 4, and the points they have earned so far.我将举一个简单的例子,包括 5 个 1 到 4 岁的孩子,以及他们迄今为止获得的分数。 Normally, I would prefer unique indexes but for this example we want you want to search by index.通常,我更喜欢唯一索引,但对于这个示例,我们希望您希望按索引进行搜索。 So I am making it nonunique.所以我让它变得不独特。

import pandas as pd
df = pd.DataFrame({'Points': [2, 4, 8, 3, 2, 5, 6],
                   'Age': [2, 3, 4, 1, 2, 4, 3]},
                  index=['Bob', 'Mike', 'Steve', 'Kate', 'Jane', 'Jill', 'Jane'])
                  
print (df)

The DataFrame will look like this: DataFrame 将如下所示:

       Points  Age
Bob         2    2
Mike        4    3
Steve       8    4
Kate        3    1
Jane        2    2
Jill        5    4
Jane        6    3

If we want to find all the kids with Age = 2 and name = Jane, you can give:如果我们想找到 Age = 2 和 name = Jane 的所有孩子,你可以给出:

x = df.loc[(df.Age == 2) & (df.index == 'Jane'), 'Points'].tolist() #or .values

print (x)

Output will be: Output 将是:

[2]

If you want to find all the kids with Age = 2 and have scored 2 points, you can give:如果你想找到所有年龄为 2 且得分为 2 分的孩子,你可以给出:

y = df.index[(df.Age == 2) & (df.Points == 2)].tolist() #or .values

print (y)

Output will be: Output 将是:

['Bob', 'Jane']

If you want to find all the kids with Age = 3, you can give:如果你想找到所有年龄 = 3 的孩子,你可以给出:

z = df.index[(df.Age == 3)].tolist() #or .values

print (z)

Output will be: Output 将是:

['Mike', 'Jane']

Let's say you want to narrow this down and update Jane aged 2 with Found in New_col , then you can give:假设您想缩小范围并使用 Found in New_col更新 Jane 2 岁,那么您可以给出:

df.loc[(df.Age == 2) & (df.index == 'Jane'), 'New_col'] = 'Found'

print (df)

The outputs will be:输出将是:

       Points  Age New_col
Bob         2    2     NaN
Mike        4    3     NaN
Steve       8    4     NaN
Kate        3    1     NaN
Jane        2    2   Found
Jill        5    4     NaN
Jane        6    3     NaN

None of these required us to iterate through the dataframe using df.iterrows().这些都不需要我们使用 df.iterrows() 遍历 dataframe。 We can do a lot of these data manipulation without the use of df.iterrows().我们可以在不使用 df.iterrows() 的情况下进行很多此类数据操作。 If you have a specific usecase, let's review it and come up with a solution that does not involve iteration.如果您有特定的用例,让我们回顾一下并提出一个不涉及迭代的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM