[英]How to iterate dataframe´s index and columns in comprehension list?
If I need to iterate a df
in a comprehension list, I would do this:如果我需要在理解列表中迭代df
,我会这样做:
df['new_col'] = [x if y == 1 and z == 2 for x,y in df[['col_1', 'col_2']].values]
If instead of iterating col_1
and col_2
, I need to iterate df.index
and df_col_2
values?如果不是迭代col_1
和col_2
,我需要迭代df.index
和df_col_2
值?
What is the syntax inside the comprehension list in this for
conditional example?这个for
示例的理解列表中的语法是什么?
pandas dataframe and series have iteration methods. pandas dataframe 和系列有迭代方法。 So to iterate over index and a given column you can use iteritems:因此,要迭代索引和给定列,您可以使用 iteritems:
df['new_col'] = [x if y == '1' and z =='2' for x, y in df['col_2'].iteritems()]
In this case x is the index
and y the value of column col2
在这种情况下,x 是index
,y 是col2
列的值
More generally iterrows
gives you access to index and all columns in one iteration:更一般地, iterrows
使您可以在一次迭代中访问索引和所有列:
for idx, row in df.iterrows():
print("Index", idx)
print("col1", row.col1)
print("col2", row.col2)
...
Let's take an example of the dataframe.我们以 dataframe 为例。 Then look at various things you can do with df.loc and df.index.然后看看你可以用 df.loc 和 df.index 做的各种事情。
I will take a simple example of 5 kids ages 1 thru 4, and the points they have earned so far.我将举一个简单的例子,包括 5 个 1 到 4 岁的孩子,以及他们迄今为止获得的分数。 Normally, I would prefer unique indexes but for this example we want you want to search by index.通常,我更喜欢唯一索引,但对于这个示例,我们希望您希望按索引进行搜索。 So I am making it nonunique.所以我让它变得不独特。
import pandas as pd
df = pd.DataFrame({'Points': [2, 4, 8, 3, 2, 5, 6],
'Age': [2, 3, 4, 1, 2, 4, 3]},
index=['Bob', 'Mike', 'Steve', 'Kate', 'Jane', 'Jill', 'Jane'])
print (df)
The DataFrame will look like this: DataFrame 将如下所示:
Points Age
Bob 2 2
Mike 4 3
Steve 8 4
Kate 3 1
Jane 2 2
Jill 5 4
Jane 6 3
If we want to find all the kids with Age = 2 and name = Jane, you can give:如果我们想找到 Age = 2 和 name = Jane 的所有孩子,你可以给出:
x = df.loc[(df.Age == 2) & (df.index == 'Jane'), 'Points'].tolist() #or .values
print (x)
Output will be: Output 将是:
[2]
If you want to find all the kids with Age = 2 and have scored 2 points, you can give:如果你想找到所有年龄为 2 且得分为 2 分的孩子,你可以给出:
y = df.index[(df.Age == 2) & (df.Points == 2)].tolist() #or .values
print (y)
Output will be: Output 将是:
['Bob', 'Jane']
If you want to find all the kids with Age = 3, you can give:如果你想找到所有年龄 = 3 的孩子,你可以给出:
z = df.index[(df.Age == 3)].tolist() #or .values
print (z)
Output will be: Output 将是:
['Mike', 'Jane']
Let's say you want to narrow this down and update Jane aged 2 with Found in New_col
, then you can give:假设您想缩小范围并使用 Found in New_col
更新 Jane 2 岁,那么您可以给出:
df.loc[(df.Age == 2) & (df.index == 'Jane'), 'New_col'] = 'Found'
print (df)
The outputs will be:输出将是:
Points Age New_col
Bob 2 2 NaN
Mike 4 3 NaN
Steve 8 4 NaN
Kate 3 1 NaN
Jane 2 2 Found
Jill 5 4 NaN
Jane 6 3 NaN
None of these required us to iterate through the dataframe using df.iterrows().这些都不需要我们使用 df.iterrows() 遍历 dataframe。 We can do a lot of these data manipulation without the use of df.iterrows().我们可以在不使用 df.iterrows() 的情况下进行很多此类数据操作。 If you have a specific usecase, let's review it and come up with a solution that does not involve iteration.如果您有特定的用例,让我们回顾一下并提出一个不涉及迭代的解决方案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.