繁体   English   中英

使用包含 2 个数据框和日期范围的 IP 列来使用来自 df2 的数据填充 df1 数据框

[英]Use IP column of 2 dataframes and date range to populate df1 dataframe with data from df2

我正在使用 2 个数据框。 第一个信息不完整。 第二个数据帧具有第一次看到和最后看到的时间范围的信息。 我试图使用源地址和来自 df2 的时间范围来填充源主机名和源用户名,其中来自 df1 的日期时间属于该时间范围。

df1
        sourceaddress   sourcehostname  sourceusername  endtime         datetime
0       10.0.0.59       computer1       NaN             1564666638000   2019-08-01 09:37:18
1       10.0.0.59       NaN             NaN             1564666640000   2019-08-01 09:37:20
2       10.0.0.59       NaN             NaN             1564666642000   2019-08-01 09:37:22
3       10.0.0.59       NaN             NaN             1564666643000   2019-08-01 09:37:23
4       10.0.0.59       NaN             NaN             1564666643000   2019-08-01 09:37:23
5       10.0.0.59       NaN             NaN             1564666645000   2019-08-01 09:37:25
6       10.0.0.59       computer1       NaN             1564666646000   2019-08-01 09:37:26
7       10.0.0.59       NaN             NaN             1564666646000   2019-08-01 09:37:26
8       10.0.0.59       computer1       NaN             1564666649000   2019-08-01 09:37:29
9       10.0.0.59       computer1       NaN             1564666650000   2019-08-01 09:37:30
10      10.0.0.59       NaN             NaN             1564666850000   2019-08-01 09:40:50
...
43196   10.0.0.187      computer2       NaN             1564718395000   2019-08-01 23:59:55
43197   10.0.0.187      computer2       user1           1564718397000   2019-08-01 23:59:57
43198   10.0.0.187      computer2       NaN             1564718397000   2019-08-01 23:59:57
43199   10.0.0.187      computer2       user1           1564718398000   2019-08-01 23:59:58
43200   10.0.0.187      NaN             NaN             1564718398000   2019-08-01 23:59:58
43201   10.0.0.187      computer2       user1           1564718398000   2019-08-01 23:59:58

df2
        sourceaddress   sourcehostname  sourceusername  firstseen             lastseen
0       10.0.0.59       computer1       user1           2019-08-01 09:37:59   2019-08-01 09:46:08
1       10.0.0.187      computer2       user1           2019-08-01 00:00:03   2019-08-01 23:59:58

预期结果:

df3
        sourceaddress   sourcehostname  sourceusername  endtime         datetime
0       10.0.0.59       computer1       NaN             1564666638000   2019-08-01 09:37:18
1       10.0.0.59       NaN             NaN             1564666640000   2019-08-01 09:37:20
2       10.0.0.59       NaN             NaN             1564666642000   2019-08-01 09:37:22
3       10.0.0.59       NaN             NaN             1564666643000   2019-08-01 09:37:23
4       10.0.0.59       NaN             NaN             1564666643000   2019-08-01 09:37:23
5       10.0.0.59       NaN             NaN             1564666645000   2019-08-01 09:37:25
6       10.0.0.59       computer1       NaN             1564666646000   2019-08-01 09:37:26
7       10.0.0.59       NaN             NaN             1564666646000   2019-08-01 09:37:26
8       10.0.0.59       computer1       NaN             1564666649000   2019-08-01 09:37:29
9       10.0.0.59       computer1       NaN             1564666650000   2019-08-01 09:37:30
10      10.0.0.59       computer1       user1           1564668650000   2019-08-01 10:10:50
...
43196   10.0.0.187      computer2       user1           1564718395000   2019-08-01 23:59:55
43197   10.0.0.187      computer2       user1           1564718397000   2019-08-01 23:59:57
43198   10.0.0.187      computer2       user1           1564718397000   2019-08-01 23:59:57
43199   10.0.0.187      computer2       user1           1564718398000   2019-08-01 23:59:58
43200   10.0.0.187      computer2       user1           1564718398000   2019-08-01 23:59:58
43201   10.0.0.187      computer2       user1           1564718398000   2019-08-01 23:59:58

**按照下面的例子:

df3[-5:]
        sourceaddress   sourcehostname  sourceusername  endtime          datetime               firstseen              lastseen
43197   10.99.0.187     computer2       user1           1564718397000    2019-08-01 23:59:57    2019-08-01 00:00:03    2019-08-01 23:59:58
43198   10.99.0.187     computer2       NaN             1564718397000    2019-08-01 23:59:57    2019-08-01 00:00:03    2019-08-01 23:59:58
43199   10.99.0.187     computer2       NaN             1564718398000    2019-08-01 23:59:58    2019-08-01 00:00:03    2019-08-01 23:59:58
43200   10.99.0.187     computer2       user1           1564718398000    2019-08-01 23:59:58    2019-08-01 00:00:03    2019-08-01 23:59:58
43201   10.99.0.187     computer2       user1           1564718398000    2019-08-01 23:59:58    2019-08-01 00:00:03    2019-08-01 23:59:58

它看起来像一个merge问题:

df3 = df1.merge(df2,
                on='sourceaddress', how='left',
                suffixes=['','_df2']
               )
# mark the valid time:
mask = df3['datetime'].ge(df3['firstseen']) & df3['datetime'].lt(df3['lastseen'])

# update the info
df3.loc[mask, 'sourcehostname'] = df3.loc[mask, 'sourcehostname_df2']
df3.loc[mask, 'sourceusername'] = df3.loc[mask, 'sourceusername_df2']

然后你可以删除sourcehostname_df2sourceusername_df2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM