简体   繁体   中英

Best way to convert log files (*.txt) to web-friendly files (*.html, *.jsp, etc)?

I have a bunch of log files which are pure text. Here is an example of one...

Overall Failures Log
SW Failures - 03.09.2010 - /logs/swfailures.txt - 23 errors - 24 warnings
HW Failures - 03.09.2010 - /logs/hwfailures.txt - 42 errors - 25 warnings
SW Failures - 03.10.2010 - /logs/swfailures.txt - 32 errors - 27 warnings
HW Failures - 03.10.2010 - /logs/hwfailures.txt - 11 errors - 31 warnings

These files can get quite large and contain a lot of other information. I'd like to produce an HTML file from this log that will add links to key portions and allow the user to open up other log files as a result...

SW Failures - 03.09.2010 - <a href="/logs/swfailures.txt">/logs/swfailures.txt</a> - 23 errors - 24 warnings

This is greatly simplified as I would like to add many more links and other html elements. My question is -- what is the best way to do this? If the files are large, should I generate the html before serving it to the user or will jsp do? Should I use perl or other scripting languages to do this? What are your thoughts and experiences?

Here is a simple example using Perl's HTML::Template :

#!/usr/bin/perl

use strict; use warnings;
use HTML::Template;

my $tmpl = HTML::Template->new(scalarref => \ <<EOTMPL
<!DOCTYPE HTML>
<html><head><title>HTMLized Log</title>
<style type="text/css">
#log li { font-family: "Courier New" }
.errors { background:yellow; color:red }
.warnings { background:#3cf; color:blue }
</style>
</head><body>
<ol id="log">
<TMPL_LOOP LOG>
<li><span class="type"><TMPL_VAR TYPE></span>
<span class="date"><TMPL_VAR DATE></span>
<a href="<TMPL_VAR FILE>"><TMPL_VAR FILE></a>
<span class="errors"><TMPL_VAR ERRORS></span>
<span class="warnings"><TMPL_VAR WARNINGS></span>
</li>
</TMPL_LOOP>
</ol></body></html>
EOTMPL
);

my @log;
my @fields = qw( TYPE DATE FILE ERRORS WARNINGS );

while ( my $entry = <DATA> ) {
    chomp $entry;
    last unless $entry =~ /\S/;
    my %entry;
    @entry{ @fields } = split / - /, $entry;
    push @log, \%entry;
}

$tmpl->param(LOG => \@log);
print $tmpl->output;

__DATA__
SW Failures - 03.09.2010 - /logs/swfailures.txt - 23 errors - 24 warnings
HW Failures - 03.09.2010 - /logs/hwfailures.txt - 42 errors - 25 warnings
SW Failures - 03.10.2010 - /logs/swfailures.txt - 32 errors - 27 warnings
HW Failures - 03.10.2010 - /logs/hwfailures.txt - 11 errors - 31 warnings

I like awk because of its automatic field parsing:

/failures.txt/ {
        $6="<a href=\"" $6 "\">" $6 "</a><br>"
}

{
        print
}

I'd use python regular expressions.

>>> import re
>>> a = re.compile(r'[SH]W Failures - \d\d.\d\d.\d\d\d\d - (.*) - \d+ errors -
\d+ warnings')
>>> str = 'SW Failures - 03.09.2010 - /logs/swfailures.txt - 23 errors - 24 warnings'  
>>> b = a.match(str)
>>> b
<_sre.SRE_Match object at 0x7ff34160>
>>> b.groups()
('/logs/swfailures.txt',)
>>> str.replace(b.group(1), '<a href="%s">%s</a>' % (b.group(1), b.group(1)))
'SW Failures - 03.09.2010 - <a href="/logs/swfailures.txt">/logs/swfailures.txt</a> - 23 errors - 24 warnings'

pygmentize可以处理某些格式,但在大多数情况下你可能需要为自定义词法分析器添加一些格式。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM