![](/img/trans.png)
[英]How to analyze a .log file by using python and pandas to save into a data framework?
[英]How to analyze a .log file by using python and pandas to save into a data framework?
我正在處理來自一台自動售貨機的一個示例日志文件。 (對熊貓來說很新)。 每天機器都會生成一個.log文件。
Q:如何使用python和pandas從.log文件中提取信息,並最終保存到數據框架中進行下一步分析? (下面提供樣例輸入和output)
您可以在下面找到我的示例代碼和 sample.log 文件:
filePath = "~/sample.log"
with open(filePath) as fp :
line = fp.read()
print(lines)
示例日志文件和部分日志文件以文本形式顯示
[1] Tues Jan2019 11:03:33 - (1000000) sample \FILES\testing1.LOG
TestID: ACCD2
001 2019/02/14 00:00:20 [EVENT_NONE(0)] test1Voltage=13.8V test1Current=2.1A
test2Voltage=11.8V test2Current=12.1A currentstate=NORMAL_RUN(0) BatteryHealth=100% Battylife(hr)=1hour Battery test speed (mph)=15 BatteryLoading=OFF
002 2019/02/14 00:00:23 [EVENT_NONE(0)] test1Voltage=13.8V test1Current=2.1A
test2Voltage=11.8V test2Current=12.1A currentstate=NORMAL_RUN(0) BatteryHealth=100% Battylife(hr)=1hour Battery test speed (mph)=15 BatteryLoading=OFF
003 2019/02/14 00:00:32 [EVENT_NONE(0)] test1Voltage=13.8V test1Current=2.1A
test2Voltage=11.8V test2Current=12.1A currentstate=NORMAL_RUN(0) BatteryHealth=100% Battylife(hr)=1hour Battery test speed (mph)=15 BatteryLoading=OFF
樣品所需的 output 下面
從上面的示例中,我們可以清楚地看到日志文件包含多個相似的模式信息。 而且,以下是我的一些想法/筆記:
我不確定在這種情況下如何處理,有人可以與我分享一些代碼來處理上述日志文件嗎? 謝謝你
歡迎來到 Python!
您做了正確的第一步,可以一次讀取整個文件,但我要展示的是使用fp.readline()
一次讀取一行。 從doc 的 S7.2.1 開始,
如果
f.readline()
返回一個空字符串,則已到達文件末尾
我們將實施文件結尾檢查。
with open(filePath, 'r', encoding='utf16') as fp :
ln = fp.readline() # first line skipped
ln = fp.readline() # second line skipped
data = [] # make a list to collect data
while True:
ln = fp.readline()
if ln == '':
break #end-of-file check
entities = ln.rstrip('\n').split(' ') # the line is split with space character, so each line will end up with 12 entities
entities = [entity.split('=')[-1] for entity in entities] # further split each entity with `=` and only preserve the last string. Check for yourself how split works on a string with or without `=`.
data.append(entities) # collected by the list
data_df = pd.DataFrame(data, columns=...) # put a list of length 12 to specify the column header. Remove `columns=
如果您已將數據粘貼到文本中,我本可以測試我的代碼,但現在您需要在這方面提供幫助。
這個問題本身就充滿了問題和模棱兩可。 行號處理似乎很奇怪。 您的問題意味着應該忽略第一行 000 。 但是,這可能會幫助您入門
from collections import defaultdict
from pandas import DataFrame
import sys
DATA = defaultdict(dict)
SKIP = 2
# List of columns of interest
COLUMNS = ['test1Voltage', 'test1Current', 'test2Voltage', 'test2Current', 'currentstate',
'BatteryHealth', 'Battylife(hr)', 'Battery test speed (mph)', 'BatteryLoading']
with open('testing1.log') as log:
for _ in range(SKIP):
next(log)
for line in log:
try:
if (lineno := int(line.split()[0])) == 0 and len(DATA) != 0:
break
for c in COLUMNS:
try:
i = line.index(c)
DATA[lineno][c] = line[i+len(c)+1:].split()[0]
except ValueError:
pass
except Exception as e:
print(f'Unable to process:-\n{line}...due to {e}', file=sys.stderr)
df = DataFrame.from_dict(DATA, orient='index')
print(df)
Output:
test1Voltage test1Current test2Voltage test2Current currentstate BatteryHealth Battylife(hr) BatteryLoading
0 13.8V 2.1A 11.8V 12.1A NORMAL_RUN(0) 100% 1hour OFF
1 13.8V 2.1A 11.8V 12.1A NORMAL_RUN(0) 100% 1hour OFF
2 13.8V 2.1A 11.8V 12.1A NORMAL_RUN(0) 100% 1hour OFF
3 13.8V 2.1A 11.8V 12.1A NORMAL_RUN(0) 100% 1hour OFF
4 13.8V 2.1A 11.8V 12.1A NORMAL_RUN(0) 100% 1hour OFF
245 13.8V 2.1A 11.8V 12.1A NORMAL_RUN(0) 100% 1hour OFF
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.