简体   繁体   English

如何混合 groupby.sum() 的结果

[英]How can mix the result of groupby.sum()

I get some firewall trffic log and analysis it我得到一些防火墙 trffic 日志并对其进行分析

I want mix two groupby.sum() result我想混合两个 groupby.sum() 结果

this my code这是我的代码

    def analysis(data_location, col_name):


    DATA_OPEN = open(data_location, "r")
    DATA = DATA_OPEN.readlines()
    DATA_OPEN.close()
    df = []

    for data in DATA:

        data = data.rstrip("\n")
        data = data.split()
        df.append({"Firewall":data[0], "Gatway":data[1], "DATE":data[2],
                   "Rule_name":data[3], col_name:data[4], "Count":int(data[5])})




    df = pd.DataFrame(df)

    df = df[["Firewall", "Gatway", "DATE", "Rule_name", col_name, "Count"]]
    df = df.groupby(["Firewall", "Gatway", "DATE", "Rule_name", col_name])
    print(df.sum().reset_index())

and this result这个结果

    DST = analysis("united_temp_fw_dst_log.txt", "dst")

    """the result
                                                      Count
    Firewall   Gatway DATE    Rule_name  dst                   
    10_1_81_34 vsys1  2019104 allow_Drop 10.1.81.255         34
                                         10.255.63.18        16
                                         103.226.213.30       4
                                         129.146.178.96     282
                                         183.177.72.201       4
                                         183.177.72.202       4
                                         220.133.209.243      4
                                         8.8.8.8            597"""


    SRC = analysis("united_temp_fw_src_log.txt", "src")
    """the result
                                                          Count
    Firewall   Gatway DATE    Rule_name  src               
    10_1_81_34 vsys1  2019104 allow_Drop 10.1.81.10       8
                                         10.1.81.11      12
                                         10.1.81.115     11
                                         10.1.81.118      3
                                         10.1.81.245    911"""

i want use ["Firewall", "Gatway", "DATE", "Rule_name"] be index and column like this我想使用 ["Firewall", "Gatway", "DATE", "Rule_name"] 像这样的索引和列

    Firewall   Gatway DATE    Rule_name  src          count     dst             count
    10_1_81_34 vsys1  2019104 allow_Drop 10.1.81.10       8    10.1.81.255         34
                                         10.1.81.11      12    10.255.63.18        16
                                         10.1.81.115     11    103.226.213.30       4
                                         10.1.81.118      3    129.146.178.96     282
                                         10.1.81.245    911    183.177.72.201       4
                                                               183.177.72.202       4
                                                               220.133.209.243      4 
                                                               8.8.8.8            597

how can i do?我能怎么做? I tried reset_index() and groupby() but this is not I want answer.我尝试了 reset_index() 和 groupby() 但这不是我想要的答案。

A simple join will do the trick:一个简单的连接就可以了:

DST.join(SRC)

Can you change the name of the columns so that you don't have repeated column names (count in your case)?您可以更改列的名称,以便您没有重复的列名(在您的情况下计数)? If yes I would use pandas concat function:如果是,我会使用 pandas concat function:

#generate simpler version of your dataframe
df=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34','10_1_81_34'],
         'Gatway':['vsys1','vsys1','vsys1'],
         'dst':['10.1.81.255','10.255.63.18','103.226.213.30'],
         'count_dst':[34,16,4]})
df.set_index(['Firewall','Gatway'],inplace=True)
df2=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34','10_1_81_34'],
         'Gatway':['vsys1','vsys1','vsys1'],
         'src':['10.1.81.10','10.1.81.11','10.1.81.115'],
         'count_src':[8,12,11]})
df2.set_index(['Firewall','Gatway'],inplace=True)

#Concatenate dataframes along columns
df3=pd.concat([df,df2],axis=1)

Using pd.concat I get the following output:使用 pd.concat 我得到以下 output:

                              dst  count_dst          src  count_src
Firewall   Gatway                                                   
10_1_81_34 vsys1      10.1.81.255         34   10.1.81.10          8
           vsys1     10.255.63.18         16   10.1.81.11         12
           vsys1   103.226.213.30          4  10.1.81.115         11

Edit to work with dataframes of different length:编辑以使用不同长度的数据框:

#generate simpler version of your dataframe
df=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34'],
         'Gatway':['vsys1','vsys1'],
         'dst':['10.1.81.255','10.255.63.18'],
         'count_dst':[34,16]})
df2=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34','10_1_81_34'],
         'Gatway':['vsys1','vsys1','vsys1'],
         'src':['10.1.81.10','10.1.81.11','10.1.81.115'],
         'count_src':[8,12,11]})

#Concatenate dataframes along columns
df3=pd.concat([df,df2],axis=1)
#Remove duplicated columns
df3.Firewall=df3.Firewall.dropna(axis=1)
df3.Gatway=df3.Gatway.dropna(axis=1)
df3=df3.loc[:,~df3.columns.duplicated()]

#set index
df3.set_index(['Firewall','Gatway'],inplace=True)

this is the output:这是 output:

                            dst  count_dst          src  count_src
Firewall   Gatway                                                 
10_1_81_34 vsys1    10.1.81.255       34.0   10.1.81.10          8
           vsys1   10.255.63.18       16.0   10.1.81.11         12
           vsys1            NaN        NaN  10.1.81.115         11

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM