简体   繁体   English

如何从熊猫数据框创建汇总新行并将其添加回仅特定列的同一数据框

[英]How to create a summarize new row from a pandas Dataframe and add it back to the same Dataframe for only specific columns

I have the below pandas dataframe.我有以下熊猫数据框。

d = {'id1': ['85643', '85644','8564312','8564314','85645','8564316','85646','8564318','85647','85648','85649','85655'],'ID': ['G-00001', 'G-00001','G-00002','G-00002','G-00001','G-00002','G-00001','G-00002','G-00001','G-00001','G-00001','G-00001'],'col1': [1, 2,3,4,5,60,0,0,6,3,2,4],'Goal': [np.nan, 56,np.nan,89,73,np.nan ,np.nan ,np.nan, np.nan, np.nan, 34,np.nan ], 'col2': [3, 4,32,43,55,610,0,0,16,23,72,48],'col3': [1, 22,33,44,55,60,1,5,6,3,2,4],'Name': ['aasd', 'aasd','aabsd','aabsd','aasd','aabsd','aasd','aabsd','aasd','aasd','aasd','aasd'],'Date': ['2021-06-13', '2021-06-13','2021-06-13','2021-06-14','2021-06-15','2021-06-15','2021-06-13','2021-06-16','2021-06-13','2021-06-13','2021-06-13','2021-06-16']}

dff = pd.DataFrame(data=d)
dff
     id1     ID     col1 Goal   col2    col3   Name      Date
0   85643   G-00001 1   NaN     3       1     aasd      2021-06-13
1   85644   G-00001 2   56.0000 4       22    aasd      2021-06-13
2   8564312 G-00002 3   NaN     32      33    aabsd     2021-06-13
3   8564314 G-00002 4   89.0000 43      44    aabsd     2021-06-14
4   85645   G-00001 5   73.0000 55      55    aasd      2021-06-15
5   8564316 G-00002 60  NaN     610     60    aabsd     2021-06-15
6   85646   G-00001 0   NaN     0       1     aasd      2021-06-13
7   8564318 G-00002 0   NaN     0       5     aabsd     2021-06-16
8   85647   G-00001 6   NaN     16      6     aasd      2021-06-13
9   85648   G-00001 3   NaN     23      3     aasd      2021-06-13
10  85649   G-00001 2   34.0000 72      2     aasd      2021-06-13
11  85655   G-00001 4   NaN     48      4     aasd      2021-06-16

I want to summarize some of the columns and add them back to the same datframe based on some ids in the "id1" column.我想总结一些列,并根据“id1”列中的一些 id 将它们添加回相同的数据框。 Also, I want to give a new name to the "ID" column when we add that row.另外,当我们添加该行时,我想为“ID”列指定一个新名称。 for example, I have some "id1" column slices.例如,我有一些“id1”列切片。

#Based on below "id1" column ids I want to summarize only "col1","col2","col3",and "Name" columns. #Then I want to add that row back to the same dataframe and give a new id for "ID" column. 
b65 = ['85643','85645', '85655','85646']
b66 = ['85643','85645','85647','85648','85649','85644']
b67 = ['8564312','8564314','8564316','8564318']
# I want to aggregate sum for col1,col2 and If possible col3 with average. Otherwise it also with sum.
# So final dataframe look like below
     id1     ID     col1 Goal   col2    col3   Name      Date
0   85643   G-00001 1   NaN     3       1     aasd      2021-06-13
1   85644   G-00001 2   56.0000 4       22    aasd      2021-06-13
2   8564312 G-00002 3   NaN     32      33    aabsd     2021-06-13
3   8564314 G-00002 4   89.0000 43      44    aabsd     2021-06-14
4   85645   G-00001 5   73.0000 55      55    aasd      2021-06-15
5   8564316 G-00002 60  NaN     610     60    aabsd     2021-06-15
6   85646   G-00001 0   NaN     0       1     aasd      2021-06-13
7   8564318 G-00002 0   NaN     0       5     aabsd     2021-06-16
8   85647   G-00001 6   NaN     16      6     aasd      2021-06-13
9   85648   G-00001 3   NaN     23      3     aasd      2021-06-13
10  85649   G-00001 2   34.0000 72      2     aasd      2021-06-13
11  85655   G-00001 4   NaN     48      4     aasd      2021-06-16
12          b65     10          106     61    aasd
13          b66     17          169     67    aasd
14          b67     67          685     142   aabsd   

#I was tried to do it in groupby and pandas pivot table and didn't get to work. Any suggestion would be appreciated.
Thanks in advance!

I am not sure how you want to handle the name column but you could just add it to the agg function我不确定您想如何处理 name 列,但您可以将其添加到 agg 函数中

b65 = ['85643','85645', '85655','85646']
b66 = ['85643','85645','85647','85648','85649','85644']
b67 = ['8564312','8564314','8564316','8564318']

# create a dictionary
d_map = {'b65': b65, 'b66': b66, 'b67': b67}
# dictionary comprehension
df = pd.DataFrame({k: dff[dff['id1'].isin(v)].agg({'col1': sum, 'col2': sum,
                                               'col3': 'mean', 'Name': min})
                   for k,v in d_map.items()}).T.reset_index()
# rename the columns
df = df.rename(columns={'index': 'ID'})
# concat the two frames
pd.concat([dff, df]).reset_index(drop=True)

        id1       ID col1  Goal col2       col3   Name        Date
0     85643  G-00001    1   NaN    3          1   aasd  2021-06-13
1     85644  G-00001    2  56.0    4         22   aasd  2021-06-13
2   8564312  G-00002    3   NaN   32         33  aabsd  2021-06-13
3   8564314  G-00002    4  89.0   43         44  aabsd  2021-06-14
4     85645  G-00001    5  73.0   55         55   aasd  2021-06-15
5   8564316  G-00002   60   NaN  610         60  aabsd  2021-06-15
6     85646  G-00001    0   NaN    0          1   aasd  2021-06-13
7   8564318  G-00002    0   NaN    0          5  aabsd  2021-06-16
8     85647  G-00001    6   NaN   16          6   aasd  2021-06-13
9     85648  G-00001    3   NaN   23          3   aasd  2021-06-13
10    85649  G-00001    2  34.0   72          2   aasd  2021-06-13
11    85655  G-00001    4   NaN   48          4   aasd  2021-06-16
12      NaN      b65   10   NaN  106      15.25   aasd         NaN
13      NaN      b66   19   NaN  173  14.833333   aasd         NaN
14      NaN      b67   67   NaN  685       35.5  aabsd         NaN

This is where the magic happens:这就是魔法发生的地方:

df = pd.DataFrame({k: dff[dff['id1'].isin(v)].agg({'col1': sum, 'col2': sum,
                                                   'col3': 'mean', 'Name': min})
                   for k,v in d_map.items()}).T.reset_index()

dff[dff['id1'].isin(v)] is called boolean indexing which filters your frame where id1 is in v or the value for each key in the dict. dff[dff['id1'].isin(v)]被称为布尔索引,它过滤你的框架,其中id1v或字典中每个键的值。 The dictonary comprehension iterates through the d_map dictionary's key (k) and values (v)字典推导遍历d_map字典的键 (k) 和值 (v)

.agg is a function used to aggregate data .agg是一个用于聚合数据的函数

you can do this:你可以这样做:

all_lists = [b65,b66,b67]

for item in all_lists: 
    x = dff[dff.id1.isin(item)]
    y = x.sum()

    y.id1 = ''
    y.ID= ''
    y.Goal =''
    y.Name=''
    y.Date = ''

    dff = dff.append(y,ignore_index=True)
    

and this is the result:这是结果:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将pandas系列作为新列添加到特定数据框行 - Add pandas Series as new columns to a specific Dataframe row 如何从数据框中的其他列创建新的Pandas数据框列 - How to create a new Pandas dataframe column from other columns in the dataframe 如何仅汇总数据框的某些列(python pandas) - How to summarize only certain columns of dataframe (python pandas) 从熊猫数据框中的唯一行值创建新列 - Create new columns from unique row values in a pandas dataframe 将同一行从 pandas dataframe 多次添加到新行,每次更改特定列中的值 - Add the same row multiple times from a pandas dataframe to a new one, each time altering a value in a specific column 如何根据其他列向pandas数据帧添加新行? - how to add new row to pandas dataframe based on other columns? 如何从数据框中创建多个附加列并添加到同一个数据框中 - How to create multiple additional columns from dataframe and add to the same dataframe 如何仅使用列表理解在 Pandas 数据框中创建新列? - How to create new columns in pandas dataframe with ONLY a list-comprehension? 如何总结熊猫数据框 - How to summarize pandas dataframe Pandas 如何根据所有行的值、应用于整个数据帧的特定列值向数据帧添加新列 - Pandas how add a new column to dataframe based on values from all rows, specific columns values applied to whole dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM