简体   繁体   English

使用bigquery分析iis日志

[英]using bigquery to analyze iis logs

Any preferred way/example to load and analyze IIS logs (in Extended Log File Format ) using bigquery? 使用bigquery加载和分析IIS日志( 扩展日志文件格式 )的任何首选方法/示例吗? we will also need to auto-partition it. 我们还需要对其进行自动分区。 we can get log files periodically 我们可以定期获取日志文件

we want to analyze Usage of a particular feature, which can be identified by a particular URL pattern and a conversion funnel of most popular flows that visitors take through the website, to identify where they come in and leave. 我们要分析特定功能的使用情况,可以通过特定的URL模式和访问者通过网站获得的最受欢迎流量的转换渠道来确定特定功能的使用情况,从而确定他们进入和离开的地方。 Visitors can be identified with a unique ID in a cookie (stored in the logs) and pages can be linked with the referer (also stored in the logs). 可以使用cookie(存储在日志中)中的唯一ID来识别访问者,并且可以将页面与引荐链接(也存储在日志中)。

Thanks in advance 提前致谢

It's easy to load CSV format files into BigQuery. 将CSV格式的文件加载到BigQuery很容易。 Both CSV and JSON format source data is supported. CSV和JSON格式的源数据均受支持。

I am not an expert in using IIS, but the quickest way to load flat log data into BigQuery is to start with CSV. 我不是使用IIS的专家,但是将平面日志数据加载到BigQuery的最快方法是从CSV开始。 IIS log format is pretty straightforward t work with, but you might want save a step and export it into CSV. IIS日志格式非常简单,但是您可能需要保存一个步骤并将其导出为CSV。 A quick search shows that many people use LogParser (note: I have never used it myself) to convert IIS logs into CSV. 快速搜索显示许多人使用LogParser (注意:我自己从未使用过)将IIS日志转换为CSV。 Perhaps give this or a similar tool a try. 也许尝试一下此工具或类似工具。

As for "auto-partioning" your BigQuery dataset tables - BigQuery doesn't do this automatically, but it's fairly easy to create a new table for each batch of IIS logs you export. 至于“自动分区”您的BigQuery数据集表-BigQuery不会自动执行此操作,但是为导出的每批IIS日志创建一个新表都相当容易。

Depending on the volume of data you are analysing, you should create a new BigQuery table per day or hour. 根据您要分析的数据量,您应该每天或每小时创建一个新的BigQuery表。

Scripting this on the command line is pretty easy when using the BigQuery command line tool . 使用BigQuery命令行工具时,在命令行上编写脚本非常容易。 Create a new BigQuery load job, with a new table name based on each timeslice of log data you have. 根据您拥有的日志数据的每个时间片,使用新的表名创建一个新的BigQuery加载作业。

In other words, your BigQuery tables should look something like this: 换句话说,您的BigQuery表应如下所示:

mydataset.logs_2012_10_29
mydataset.logs_2012_10_30
mydataset.logs_2012_10_31
etc...

For more information, make sure you read through the BigQuery documentation for importing data . 有关更多信息,请确保您已通读BigQuery文档以导入数据

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM