解析日志文件并将其有效地写入csv文件

Question

I have a log file which I am parsing using regex. 我有一个使用正则表达式解析的日志文件。 It returns me 3 elements 它给了我3个元素

1) timestamp 1）时间戳

2) numberid 2）numberid

3) objectvalue 3）对象值

I intend to write this in a CSV file efficiently (As the log file size could be huge). 我打算将此文件有效地写入CSV文件中（因为日志文件的大小可能很大）。
I have tried this 我已经试过了

def read_logs(input_file):
    data = defaultdict()
    for each in input_file:
        regex_match = re(r'',each)
        data['timestamp'].append(regex_match.group(1))
        data['numberid'].append(regex_match.group(2))
        data['objectvalue'].append(regex_match.group(3))
    return data

def main(inputname,outputname):
    with open(inputname) as input_file:
        data = read_logs(input_file)
    with open(outputname,'w') as out_file:
        write_file(out_file,data)

def write_file(out_file):
    out = csv.writer(out_file)
    out.writerow(['timestamp_val','numberid','objectvalue'])

1) I thought using defaultdict would be the most efficient way of writing such data it in a file. 1）我认为使用defaultdict将是将此类数据写入文件的最有效方法。 Here defaultdict keys are timestamp numberid and obejctvalue with list as its value. 这里的defaultdict键是timestamp numberid和以list为值的obejctvalue 。 How do I write this in a CSV file? 如何在CSV文件中编写此代码？

Sample data value is 样本数据值为
data = ('timestamp_val':['10:10:54','13:02:07','03:02:10'],'numberid':[AA10,BB18,FF34],'objectvalue':['NHAG','ABCD','YTAB'])

2) If this is not an efficient way, what could be a better way to accomplish this? 2）如果这不是一种有效的方法，那么有什么更好的方法可以做到这一点？

Other way, I could think of is reading each line using regex and writing simultaneously in CSV file. 换句话说，我想到的是使用正则表达式读取每一行并同时在CSV文件中写入。 Is this a good approach? 这是一个好方法吗？

Answer 1

I think you don't need to read all the file in dic of list: written as soon as read 我认为您不需要读取列表dic中的所有文件：读取后立即写入

def main(inputname,outputname):
    with open(inputname) as input_file, open(outputname,'w') as out_file:
        out = csv.writer(out_file)
        out.writerow(['timestamp_val','numberid','objectvalue'])
        for each in input_file:
            regex_match = re(r'',each)
            out.writerow([regex_match.group(1), regex_match.group(2), regex_match.group(3)])

解析日志文件并将其有效地写入csv文件

问题描述

1 个解决方案

解决方案1
0 2018-04-20 09:13:05

解析日志文件并将其有效地写入csv文件

问题描述

1 个解决方案

解决方案1 0 2018-04-20 09:13:05

解决方案1
0 2018-04-20 09:13:05