[英]Calculate specific sums in dataframe, based on unique values in two other columns, and write to new column
I have a date frame with 3 columns, including some duplicate rows:我有一个包含 3 列的日期框架,包括一些重复的行:
dict1 = {'experiment': ['A', 'B', 'B', 'C', 'C', 'C', 'C'], 'run': ['A-1', 'B-1', 'B-2', 'C-1', 'C-1', 'C-2', 'C-2'], 'data': [6, 5, 5, 4, 4, 4, 4]}
df1 = pd.DataFrame(data=dict1)
print(df1)
experiment run data
0 A A-1 6
1 B B-1 5
2 B B-2 5
3 C C-1 4
4 C C-1 4
5 C C-2 4
6 C C-2 4
I am trying to create a new column that, for each row, contains the sum of column 'data' of unique runs for that experiment.我正在尝试为每一行创建一个新列,其中包含该实验的唯一运行的列“数据”的总和。 The duplicate rows should stay intact.重复的行应该保持不变。 So my expected outcome is:所以我的预期结果是:
experiment run data exp-sum
0 A A-1 6 6
1 B B-1 5 10
2 B B-2 5 10
3 C C-1 4 8
4 C C-1 4 8
5 C C-2 4 8
6 C C-2 4 8
I have tried combining.groupby and.unique, but so far I only get the correct sums per run, which would need to be further summed up (per experiment) and then written into the original df.我尝试过组合.groupby 和.unique,但到目前为止,我每次运行只能得到正确的总和,这需要进一步总结(每次实验),然后写入原始 df。
print(df1.groupby('run')['data'].unique())
run
A-1 [6]
B-1 [5]
B-2 [5]
C-1 [4]
C-2 [4]
Any input very welcome!非常欢迎任何输入!
If I get the objective right, the code below should do the job如果我的目标正确,下面的代码应该可以完成这项工作
unique = df.drop_duplicates(subset=['experiment', 'run'], keep='first')
sums = pd.DataFrame( unique.groupby('experiment').data.sum() ).reset_index(drop=False)
df = df.merge( sums, on=['experiment'], how='inner' )
Another solution, using .pivot_table
:另一种解决方案,使用.pivot_table
:
df1 = df1.set_index("experiment")
x = df1.pivot_table(
index=pd.Grouper(level=0),
columns="run",
values="data",
aggfunc=lambda x: x.unique().sum(),
).sum(axis=1)
df1["exp-sum"] = x
print(df1.reset_index())
Prints:印刷:
experiment run data exp-sum
0 A A-1 6 6.0
1 B B-1 5 10.0
2 B B-2 5 10.0
3 C C-1 4 8.0
4 C C-1 4 8.0
5 C C-2 4 8.0
6 C C-2 4 8.0
You can create a map of unique runs of experiment as key and sum of data as values.您可以创建一个 map,将独特的实验运行作为键,将数据总和作为值。 Then use Series.map
to map the values to experiment,然后使用Series.map
到 map 的值进行实验,
mapper = df1.drop_duplicates('run').groupby('experiment')['data'].sum()
print(mapper)
experiment
A 6
B 10
C 8
df1['exp-sum'] = df1['experiment'].map(mapper)
print(df1)
experiment run data exp-sum
0 A A-1 6 6
1 B B-1 5 10
2 B B-2 5 10
3 C C-1 4 8
4 C C-1 4 8
5 C C-2 4 8
6 C C-2 4 8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.