简体   繁体   English

Bigquery multiple join using 子句

[英]Bigquery multiple join using clause

I need to get the BGP AS details for the IP addresses in a table, the table contains SrcAddr and DstAddr as mentioned in table1我需要获取表中 IP 地址的 BGP AS 详细信息,该表包含表 1 中提到的 SrcAddr 和 DstAddr

table1表格1

SrcAddr源地址 DstAddr目标地址 Bytes字节
1.1.1.1 1.1.1.1 8.8.8.8 8.8.8.8 1005 1005

Table2 contains the BGP as number details.表 2 包含 BGP 作为编号详细信息。

Table2表2

IPaddr IP地址 Organization组织 network_bin网络箱 mask面具
1.1.1.0/24 1.1.1.0/24 Cloudflare云焰 asdjqowiq asdjqowiq 24 24
8.8.8.0/24 8.8.8.0/24 Google谷歌 asdqwrqsd asdqwrqsd 24 24

I want to build a final table like below我想建立一个如下所示的决赛桌

Table3表3

SrcAddr源地址 SrcAS SrcAS DstAddr目标地址 Dst AS远程自治系统 Bytes字节
1.1.1.1 1.1.1.1 Cloudflare云焰 8.8.8.8 8.8.8.8 Google谷歌 1005 1005

I used the below query by referring to the doc https://cloudplatform.googleblog.com/2014/03/geoip-geolocation-with-google-bigquery.html and was able to get the src_as field but was not able to resolve the dst_as.我通过参考文档https://cloudplatform.googleblog.com/2014/03/geoip-geolocation-with-google-bigquery.html使用了以下查询,并且能够获取 src_as 字段但无法解析dst_as。 can someone help me with this?有人可以帮我弄这个吗?

WITH source_of_ip_addresses AS (
  SELECT  SamplerAddress, REGEXP_REPLACE(SrcAddr, 'xxx', '0')  srcip, REGEXP_REPLACE(DstAddr, 'xxx', '0')  dstip
  FROM `fluentd.netflow_message`
  WHERE SrcAddr IS NOT null 
  GROUP BY 1,2,3
)

SELECT *, srcip, src_as,
FROM (
  SELECT srcip, network_bin, mask, autonomous_system_organization as src_as
  FROM (
    SELECT *, NET.SAFE_IP_FROM_STRING(source_of_ip_addresses.srcip) & NET.IP_NET_MASK(4, mask) network_bin ,
    FROM source_of_ip_addresses, UNNEST(GENERATE_ARRAY(9,32)) mask
    WHERE BYTE_LENGTH(NET.SAFE_IP_FROM_STRING(srcip)) = 4
  )
  JOIN `fluentd.asn_block_processed`  USING (network_bin, mask) 

just repeat the same process.只需重复相同的过程。 Also it is more convenient to use WITH clause instead of nested queries to make it simpler to repeat this code.此外,使用 WITH 子句而不是嵌套查询更方便,可以更简单地重复此代码。 Something like below.像下面这样的东西。 I obviously don't have access to your tables, so cannot check syntax, there will likely be duplicate columns you'll need to remove by using explicit column names rather than * .我显然无权访问您的表,因此无法检查语法,您可能需要使用显式列名而不是*来删除重复的列。

WITH source_of_ip_addresses AS (
  SELECT  SamplerAddress, REGEXP_REPLACE(SrcAddr, 'xxx', '0')  srcip, REGEXP_REPLACE(DstAddr, 'xxx', '0')  dstip
  FROM `fluentd.netflow_message`
  WHERE SrcAddr IS NOT null 
  GROUP BY 1,2,3
), source_with_masks AS (
  SELECT *, NET.SAFE_IP_FROM_STRING(source_of_ip_addresses.srcip) & NET.IP_NET_MASK(4, mask) network_bin ,
  FROM source_of_ip_addresses, UNNEST(GENERATE_ARRAY(9,32)) mask
  WHERE BYTE_LENGTH(NET.SAFE_IP_FROM_STRING(srcip)) = 4
), source_processed AS (
  SELECT *
  FROM source_with_masks
  JOIN `fluentd.asn_block_processed`  USING (network_bin, mask)
), dest_with_masks AS (
  -- same as above, with dstip instead of srcip
  SELECT *, NET.SAFE_IP_FROM_STRING(source_of_ip_addresses.dstip) & NET.IP_NET_MASK(4, mask) network_bin ,
  FROM source_processed, UNNEST(GENERATE_ARRAY(9,32)) mask
  WHERE BYTE_LENGTH(NET.SAFE_IP_FROM_STRING(srcip)) = 4
), dest_processed AS (
  SELECT *
  FROM dest_with_masks
  JOIN `fluentd.asn_block_processed`  USING (network_bin, mask)
)
SELECT * from dest_processed

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM