[英]Add values in new column of df based on a condition
I have the following df
sorted by date
and the by name
:我有以下df
按date
和name
排序:
date name valor
2 2018-03-01 ACC 75
0 2018-03-01 ACE 50
0 2018-03-20 ACE 50
1 2018-03-01 BBV 20
1 2018-03-14 BBV 20
5 2018-04-16 BBV 58
6 2018-04-20 BBV -58
I am looking forward to generate a new column
(called result
)in the df where if one of the values in name
is the same as the one after , then add them together in the new column.我期待在 df 中生成一个新column
(称为result
),如果name
中的值之一与 之后的值相同,则将它们添加到新列中。
The desired output would look something like this:所需的输出如下所示:
date name valor result
2 2018-03-01 ACC 75 75
0 2018-03-01 ACE 50 50
0 2018-03-20 ACE 50 100
1 2018-03-01 BBV 20 20
1 2018-03-14 BBV 20 40
5 2018-04-16 BBV 58 98
6 2018-04-20 BBV -58 40
This is what I am trying:这就是我正在尝试的:
for index,row in df.iterrows():
for i in range(1,len(df)+1):
if (row['name'][i]==row['name'][i+1]) and ( row['name'][i-1]!=row['name'][i]):
df["result"]=df["valor"][i]+df["valor"][i+1]
elif (row['name'][i]==row['name'][i+1]) and (row['name'][i-1]==row['name'][i]):
df["result"]=df["result"][i]+df["valor"][i+1]
An indexing error
outputs indicating string index out of range
, however I am sure there should be a more efficient way to obtain the desired output. indexing error
输出指示string index out of range
,但是我确信应该有更有效的方法来获得所需的输出。
Thank you for reading my post.感谢您阅读我的帖子。
You should use groupby.cumsum
for this.您应该为此使用groupby.cumsum
。 Using vectorised functionality which comes with pandas
is usually more efficient and cleaner than iterating rows.使用其自带的矢量化功能pandas
通常比迭代行更高效,更清洁。
df['result'] = df.groupby('name')['valor'].cumsum()
print(df)
date name valor result
2 2018-03-01 ACC 75 75
0 2018-03-01 ACE 50 50
0 2018-03-20 ACE 50 100
1 2018-03-01 BBV 20 20
1 2018-03-14 BBV 20 40
5 2018-04-16 BBV 58 98
6 2018-04-20 BBV -58 40
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.