简体   繁体   English

根据其他两列中的唯一值计算 dataframe 中的特定总和,并写入新列

[英]Calculate specific sums in dataframe, based on unique values in two other columns, and write to new column

I have a date frame with 3 columns, including some duplicate rows:我有一个包含 3 列的日期框架,包括一些重复的行:

dict1 = {'experiment': ['A', 'B', 'B', 'C', 'C', 'C', 'C'], 'run': ['A-1', 'B-1', 'B-2', 'C-1', 'C-1', 'C-2', 'C-2'], 'data': [6, 5, 5, 4, 4, 4, 4]}
df1 = pd.DataFrame(data=dict1)
print(df1)

  experiment  run  data
0          A  A-1     6
1          B  B-1     5
2          B  B-2     5
3          C  C-1     4
4          C  C-1     4
5          C  C-2     4
6          C  C-2     4

I am trying to create a new column that, for each row, contains the sum of column 'data' of unique runs for that experiment.我正在尝试为每一行创建一个新列,其中包含该实验的唯一运行的列“数据”的总和。 The duplicate rows should stay intact.重复的行应该保持不变。 So my expected outcome is:所以我的预期结果是:

  experiment  run  data  exp-sum
0          A  A-1     6        6
1          B  B-1     5       10
2          B  B-2     5       10
3          C  C-1     4        8
4          C  C-1     4        8
5          C  C-2     4        8
6          C  C-2     4        8

I have tried combining.groupby and.unique, but so far I only get the correct sums per run, which would need to be further summed up (per experiment) and then written into the original df.我尝试过组合.groupby 和.unique,但到目前为止,我每次运行只能得到正确的总和,这需要进一步总结(每次实验),然后写入原始 df。

print(df1.groupby('run')['data'].unique())

run
A-1    [6]
B-1    [5]
B-2    [5]
C-1    [4]
C-2    [4]

Any input very welcome!非常欢迎任何输入!

If I get the objective right, the code below should do the job如果我的目标正确,下面的代码应该可以完成这项工作

  1. Get unique experiments获得独特的实验
unique = df.drop_duplicates(subset=['experiment', 'run'], keep='first')
  1. Get sum of data per unique experiment获取每个独特实验的数据总和
sums = pd.DataFrame( unique.groupby('experiment').data.sum() ).reset_index(drop=False)
  1. Add new computed column to the original df through a join通过连接将新的计算列添加到原始 df
df = df.merge( sums, on=['experiment'], how='inner' )

Another solution, using .pivot_table :另一种解决方案,使用.pivot_table

df1 = df1.set_index("experiment")
x = df1.pivot_table(
    index=pd.Grouper(level=0),
    columns="run",
    values="data",
    aggfunc=lambda x: x.unique().sum(),
).sum(axis=1)
df1["exp-sum"] = x
print(df1.reset_index())

Prints:印刷:

  experiment  run  data  exp-sum
0          A  A-1     6      6.0
1          B  B-1     5     10.0
2          B  B-2     5     10.0
3          C  C-1     4      8.0
4          C  C-1     4      8.0
5          C  C-2     4      8.0
6          C  C-2     4      8.0

You can create a map of unique runs of experiment as key and sum of data as values.您可以创建一个 map,将独特的实验运行作为键,将数据总和作为值。 Then use Series.map to map the values to experiment,然后使用Series.map到 map 的值进行实验,

mapper = df1.drop_duplicates('run').groupby('experiment')['data'].sum()

print(mapper)

experiment
A     6
B    10
C     8

df1['exp-sum'] = df1['experiment'].map(mapper)

print(df1)

    experiment  run   data  exp-sum
0   A           A-1   6     6
1   B           B-1   5     10
2   B           B-2   5     10
3   C           C-1   4     8
4   C           C-1   4     8
5   C           C-2   4     8
6   C           C-2   4     8

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据其他列的唯一组合更改数据框列值 - Change the dataframe column values based on unique combination of other columns 尝试根据其他两列中的值从第二个 dataframe 中创建新的值列 - Trying to make new column of values from second dataframe based on values in two other columns pandas,根据其他两列的值创建一个新的唯一标识符列 - pandas, create a new unique identifier column based on values from two other columns Pandas DataFrame 基于其他两列创建新的 csv 列 - Pandas DataFrame create new csv column based on two other columns 如何根据其他列的值在数据框中创建新列? - How to create a new column in a dataframe based off values of other columns? 根据其他列中的“NaN”值在 Pandas Dataframe 中创建一个新列 - Create a new column in Pandas Dataframe based on the 'NaN' values in other columns 根据其他列计算唯一值的计数 - Calculate counts of unique values based on other column 添加一个新列,其值基于另外两个列的 groupby 值 - Add a new column with values based on groupby values two other columns 根据两个不同列中的各自值在 DataFrame 中创建新列 - Create new column in DataFrame based on respective values in two different columns 从现有数据框中创建新数据框,其中一列中的唯一值和其他列中的对应值 - Make new dataframe from existing dataframe with unique values from one column and corresponding values from other columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM