[英]Python - Add a column to a dataframe containing a value from another row based on condition
My dataframe looks like this: 我的数据框如下所示:
+-----+-------+----------+-------+
| No | Group | refGroup | Value |
+-----+-------+----------+-------+
| 123 | A1 | A1 | 5.0 |
| 123 | B1 | A1 | 7.3 |
| 123 | B2 | A1 | 8.9 |
| 123 | B3 | B1 | 7.9 |
| 465 | A1 | A1 | 1.4 |
| 465 | B1 | A1 | 4.5 |
| 465 | B2 | B1 | 7.3 |
+-----+-------+----------+-------+
Now I need to add another column which conatains the difference between the value of column Value
from the current row and the value of column Value
from the row with the same number ( No
) and the group ( Group
) that is written in refGroup
. 现在我需要添加哪些conatains列的值之间的差额另一列
Value
从目前的行和列的值Value
从该行具有相同数量的( No
)和组( Group
上所写的) refGroup
。
Exeption: If refGroup
equals Group
, Value
and refValue
are the same. 示例:如果
refGroup
等于Group
,则Value
和refValue
相同。
So the result should be: 因此结果应为:
+-----+-------+----------+-------+----------+
| No | Group | refGroup | Value | refValue |
+-----+-------+----------+-------+----------+
| 123 | A1 | A1 | 5.0 | 5.0 |
| 123 | B1 | A1 | 7.3 | 2.3 |
| 123 | B2 | A1 | 8.9 | 3.9 |
| 123 | B3 | B1 | 7.9 | 0.6 |
| 465 | A1 | A1 | 1.4 | 1.4 |
| 465 | B1 | A1 | 4.5 | 3.1 |
| 465 | B2 | B1 | 7.3 | 2.8 |
+-----+-------+----------+-------+----------+
Explanation for the first two rows: 前两行的说明:
First row: refGroup
equals Group
-> refValue
= Value
第一行:
refGroup
等于Group
- > refValue
= Value
Second row: search for the row with the same No
(123) and refGroup
as Group
(A1) and calculate Value
of the current row minus Value
of the referenced row (7.3 - 5.0 = 2.3). 第二行:搜索具有相同的行
No
(123)和refGroup
作为Group
(A1)和计算Value
的当前行减去的Value
引用的行的(7.3 - 5.0 = 2.3)。
I thought I might need to use groupby() and apply(), but how? 我以为我可能需要使用groupby()和apply(),但是如何?
Hope my example is detailed enough, if you need any further information, please ask :) 希望我的示例足够详细,如果您需要任何其他信息,请询问:)
One way is to use a database SQL like technique; 一种方法是使用类似数据库SQL的技术。 use 'self-join' with
merge
. 与
merge
一起使用'self-join'。 You merge/join a dataframe to itself using left_on
and right_on
to line up 'Group' with 'refGroup' then subtract the value from each dataframe record: 您可以使用
left_on
和right_on
将数据left_on
合并/ left_on
到自身,以使“ Group”与“ refGroup”对齐,然后从每个数据框记录中减去该值:
df_out = df.merge(df,
left_on=['No','refGroup'],
right_on=['No','Group'],
suffixes=('','_ref'))
df['refValue'] = np.where(df_out['Group'] == df_out['refGroup'],
df_out['value'],
df_out['value'] - df_out['value_ref'])
df
Output: 输出:
No Group refGroup value refValue
0 123 A1 A1 5.0 5.0
1 123 B1 A1 7.3 2.3
2 123 B2 A1 8.9 3.9
3 123 B3 B1 7.9 0.6
4 465 A1 A1 1.4 1.4
5 465 B1 A1 4.5 3.1
6 465 B2 B1 7.3 2.8
使用理解列表,您可以执行以下操作:
df['refValue'] = [ row['Value'] - float(df.loc[(df['No']==row['No']) & (df['Group']==row['refGroup']),'Value'].values) if row['refGroup']!=row['Group'] else row['Value'] for index, row in df.iterrows() ]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.