简体   繁体   English

如何使用UFT从大型文本文件读取数据并在另一个字符串中写入包含字符串的行?

[英]How to Read data from a large text file and write lines having some string in another using UFT?

I have problem at hand. 我手头有问题。 I need to read a log file and if any line in the log file contains a specified log level, I need to write that line in a result file. 我需要读取一个日志文件,并且如果该日志文件中的任何行包含指定的日志级别,则需要将该行写入结果文件中。 I have written following code and it is working perfectly. 我已经编写了以下代码,并且运行良好。 But my problem is - the size of log file. 但是我的问题是-日志文件的大小。 In my project max size set for a log file is 100 MB. 在我的项目中,为日志文件设置的最大大小为100 MB。 I tested this code with 2 MB - log file (15512 Lines) and it took ridiculously long time (about 1 hour 15 minutes). 我用2 MB的日志文件(15512行)测试了此代码,这花了很长时间(大约1小时15分钟)。 Also, I am not sure how will this behave with a large file. 另外,我不确定大文件的情况如何。 Do you have any other approach? 您还有其他方法吗? Fast help will be really appreciated. 快速的帮助将不胜感激。

Option Explicit

Public Function chekLogFile(sLogFileName, sLogLevelToCheck, sResultFile)
    Dim oFSO, oFile, oResultFileObj, oResultFile
    Dim sFileContent
    Dim arrFileContent
    Dim iNumberOfLinesInFile, iCounter

    ' Open the result file to write
    Set oResultFileObj = CreateObject("Scripting.FileSystemObject")
    Set oResultFile = oResultFileObj.OpenTextFile(sResultFile,8)

    ' Read content from log file
    Set oFSO = CreateObject("Scripting.FileSystemObject")
    Set oFile = oFSO.OpenTextFile(sLogFileName,1)
    sFileContent = oFile.ReadAll()
    ' Create an array with content of each line as its elements
    arrFileContent = Split(sFileContent,vbcrlf)
    ' Get the number of lines
    iNumberOfLinesInFile = UBound(arrFileContent)

    ' If the line contails the log level, write the line in the result file
    ' The lines we are concereed about start as follows

    ' 20150823135921 :::: ERROR :: 
    ' 20150823135929 :::: WARNING :: 
    ' 20150823135930 :::: INFO :: 

    ' Please note: Any other occurrence of Either of the word except like above will not be counted.

    For iCounter = 0 To iNumberOfLinesInFile Step 1
         If Mid(arrFileContent(iCounter),21,Len(sLogLevelToCheck)) = sLogLevelToCheck Then
            oResultFile.WriteLine(arrFileContent(iCounter))
        End If
    Next

    ' Close the files
    oFile.Close
    oResultFile.Close

    ' Release the objects
    Set oResultFile = Nothing
    Set oFile = Nothing
    Set oFSO = Nothing
    Set oResultFileObj = Nothing
End Function


' Log level could be either ERROR OR WARNING OR INFO
Call chekLogFile("E:\UFTTrial\gmail.log", "ERROR", "E:\UFTTrial\ResultFile.txt")

A text file is a collection of strings. 文本文件字符串的集合。 If you need to process it sequencially/line by line, slurping the file and then splitting the content into an array is a waste of time and memory. 如果您需要按顺序/逐行顺序地处理它,则先将文件制成样式,然后将内容拆分为一个数组,这是浪费时间和内存的。 Use .ReadLine() instead. 使用.ReadLine()代替。

Sample code applied to a 20 MB file took less than 2 min on my (slow) machine: 在我的(慢速)计算机上,应用于20 MB文件的示例代码用了不到2分钟的时间:

Option Explicit

Const ForAppending = 8
Const csSrcFile = "M:\lib\kurs0705\testdata\lines.txt"

Dim oFSO : Set oFSO = CreateObject("Scripting.FileSystemObject")

Dim dtStart : dtStart = Now()
checkLogFile csSrcFile, "This", "selected.txt"
Dim dtEnd   : dtEnd   = Now() - dtStart
WScript.Echo oFSO.GetFile(csSrcFile).Size / 10^6, "MB  ", FormatDateTime(dtEnd, vbShortTime)

Public Sub checkLogFile(sLogFileName, sLogLevelToCheck, sResultFile)
    Dim oInFile  : Set oInFile  = oFSO.OpenTextFile(sLogFileName)
    Dim oOutFile : Set oOutFile = oFSO.OpenTextFile(sResultFile, ForAppending, True)
    Do Until oInFile.AtEndOfStream
       Dim sLine : sLine = oInFile.ReadLine()
       If Mid(sLine, 1, Len(sLogLevelToCheck)) = sLogLevelToCheck Then
          oOutFile.WriteLine sLine
       End If
    Loop
    oInFile.Close
    oOutFile.Close
End Sub

output: 输出:

cscript readlog.vbs
20,888896 MB   00:01

The physical size of the file isn't the important aspect; 文件的物理大小不是重要的方面; the number of lines in the file is. 文件中的行数是。 The higher the line count, the longer it's going to take to use .ReadLine() 行数越高,使用.ReadLine()的时间就越长

Ekkehard's answer is pretty much verbatim to what I was going to write. Ekkehard的答案几乎完全是我要写的内容。 Keep in mind that a file with 2,000 lines and 200 characters/line will be read significantly faster than a 20,000 lines and 20 character/line file. 请记住,具有2,000行和200个字符/每行的文件的读取速度明显快于20,000行和20个字符/每行的文件。 How many lines are in the file you're trying to parse? 您要解析的文件中有几行?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM