[英]create csv headers from log file python
My log file contains some info in every row like below 我的日志文件包含每行中的一些信息,如下所示
Info1:NewOrder|key:123 |Info3:10|Info5:abc
Info3:10|Info1:OldOrder| key:456| Info6:xyz
Info1:NewOrder|key:007
I want to change it to a csv like below (if i give key,Info1,Info3 as required headers) 我想将它更改为如下所示的csv(如果我将密钥,Info1,Info3更改为必需的标头)
key,Info1.Info3
123,NewOrder,10
456,OldOrder,10
007,NewOrder,
Earlier I used awk to get field values, but logging can change the order of info and key printed in a row. 之前我使用awk来获取字段值,但是日志记录可以更改连续打印的信息和密钥的顺序。 So I cannot be sure that Info3 would always be in some particular column. 所以我不能确定Info3总是会出现在某个特定列中。 Everytime,logging changes, the script needed to be changed. 每次记录更改时,都需要更改脚本。
I intend then to load csv in pandas dataframe. 我打算然后在pandas数据帧中加载csv。 So a python solution would be better. 所以python解决方案会更好。 This is more of a data cleaning task to generate a csv from logfile. 这更像是从日志文件生成csv的数据清理任务。
This is what I have used after reading the answers 这是我在阅读答案后使用的内容
import csv
import sys
with open(sys.argv[1], 'r') as myLogfile:
log=myLogfile.read().replace('\n', '')
requested_columns = ["OrderID", "TimeStamp", "ErrorCode"]
def wrangle(string, requested_columns):
data = [dict([element.strip().split(":") for element in row.split("|")]) for row in string.split("\n")]
body = [[row.get(column) for column in requested_columns] for row in data]
return [requested_columns] + body
outpath = sys.argv[2]
open(outpath, "w", newline = "") with open(outpath, 'wb')
writer = csv.writer(file)
writer.writerows(wrangle(log, requested_columns))
Sample logfile= https://ideone.com/cny805 示例logfile = https://ideone.com/cny805
The bulk of it is just using useful string methods like strip and split, plus list comprehensions. 其中大部分只是使用有用的字符串方法,如strip和split,以及列表推导。
import csv
string = """Info1=NewOrder|key=123 |Info3=10|Info5=abc
Info3=10|Info1=OldOrder| key=456| Info6=xyz
Info1=NewOrder|key=007"""
requested_columns = ["key", "Info1", "Info3"]
def wrangle(string, requested_columns):
data = [dict([element.strip().split("=") for element in row.split("|")]) for row in string.split("\n")]
body = [[row.get(column) for column in requested_columns] for row in data]
return [requested_columns] + body
outpath = "whatever.csv"
with open(outpath, "w", newline = "") as file:
writer = csv.writer(file)
writer.writerows(wrangle(string, requested_columns))
You could use a csv reader with a |
你可以使用带有|
的csv阅读器 delimiter to get you started, then split using :
to give you a per row dictionary as follows: 分隔符让你开始,然后拆分使用:
为你提供每行字典,如下所示:
import csv
with open('input.csv', 'rb') as f_input, open('output.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
cols = ["OrderID", "TimeStamp", "ErrorCode"]
csv_output.writerow(cols)
for row in csv.reader(f_input, delimiter='|'):
# Remove any entries that do not have a colon
row = [c for c in row if c.find(':') != -1]
# Convert remaining columns into a dictionary
entries = {c.split(':')[0].strip() : c.split(':')[1].strip() for c in row}
csv_output.writerow([entries.get(c, "") for c in cols])
Giving you an output file: 给你一个输出文件:
OrderID,TimeStamp,ErrorCode
3000000,1488948188555841641,
3000000,1488948188556444675,0
To read the data directly into a Pandas dataframe: 要将数据直接读入Pandas数据帧:
import pandas as pd
import csv
cols = ["OrderID", "TimeStamp", "ErrorCode"]
data = []
with open('input.csv', 'rb') as f_input:
csv_output = csv.writer(f_output)
for row in csv.reader(f_input, delimiter='|'):
# Remove any entries that do not have a colon
row = [c for c in row if c.find(':') != -1]
# Convert remaining columns into a dictionary
entries = {c.split(':')[0].strip() : c.split(':')[1].strip() for c in row}
data.append([entries.get(c, "") for c in cols])
df = pd.DataFrame(data, columns=cols)
print df
Giving you: 给你:
OrderID TimeStamp ErrorCode
0 3000000 1488948188555841641
1 3000000 1488948188556444675 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.