简体   繁体   中英

What is the fastest way to process large .asc files?

I currently have.asc log files that have been generated from CANoe. I am using python to analyze these files. These files are pretty big(anywhere from.5GB-2GB). To read/analyze the data I am converting the data to a dataframe and I am using the following lines of code to do this:

    log=can.ASCReader(filePath)
    log=[*log]
    df_data = [{'timestamp':m.timestamp, 'data':m.data} for m in log]
    df = pd.DataFrame(df_data)

Through my analysis, the part that is taking the longest is converting the iterator to a list. I am wondering if there is a more efficient way of doing that. I am also open to doing the entire process a whole new way if it is faster. Currently a.6gb.asc file is taking about 19 minutes to run. Any help/suggestiosn would be appreciated!

The most time-consuming part is most likely reading from disk. This cannot be avoided.

However you can make sure that you do not put unnecessary data into memory or copy it around.

Try the following:

import operator
log=can.ASCReader(filePath)
pd.DataFrame(data=map(operator.attrgetter('timestamp', 'data'), log))

ASCReader will return an iterator, ie not reading data until you use log .

As you are only interested in the values behind timestamp and data , we declare and attrgetter for these two attributes. That is a function that takes an object and will return just the two given attributes of that object.

For applying this attrgetter to the log we will use map . map will apply the attrgetter to each element of log . map also returns an iterator, ie it will not read and store any data until used.

Finally we give the map into pandas as the source of data for constructing a DataFrame .

Doing it like this should be the approach with the least amount of copying data around or handling unnecessary data. YMMV

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM