• I.D.: AN000015544
DESCRIPTION: 6 1/2 DIGIT DIGITAL MULTIMETER
MANUFACTURER: HEWLETT-PACKARDMODEL NUM.: 34401A CALIBRATION - DUE DATE:6/1/2016 SERIAL NUMBER: MY45027398
• I.D.: AN000016955
DESCRIPTION: TEMPERATURE CALIBRATOR
MANUFACTURER: FLUKE MODEL NUM.: 724 CALIBRATION - DUE DATE:6/1/2016 SERIAL NUMBER: 1189063
• I.D.: AN000017259
DESCRIPTION: TRUE RMS MULTIMETER
MANUFACTURER: AGILENT MODEL NUM.: U1253A CALIBRATION - DUE DATE:6/1/2016 SERIAL NUMBER: MY49420076
• I.D.: AN000032766
DESCRIPTION: TRUE RMS MULTIMETER
MANUFACTURER: AGILENT MODEL NUM.: U1253B CALIBRATION - DUE DATE:6/1/2016 SERIAL NUMBER: MY5048 9036
Seeking to find a more efficient algorithm for parsing the manufacturer name and number. ie 'HEWLETT-PACKARDMODEL NUM.: 34401A', 'AGILENT MODEL NUM.: U1253B'...etc. from the text file above.
parts_data = {'Model_Number': []}
with open("textfile", 'r') as parts_info:
linearray = parts_info.readlines(
for line in linearray:
model_number = ''
model_name = ''
if "MANUFACTURER:" in line:
model_name = line.split(':')[1]
if "NUM.:" in line:
model_number = line.split(':')[2]
model_number = model_number.split()[0]
model_number = model_name + ' ' + model_number
parts_data['Model_Number'].append(model_number.rstrip())
My code does exactly what I want, but I think there is a faster or cleaner way to complete the action.Let's increase efficiency!
Your code looks fine already and unless you're parsing more than GB's of data I don't know what the point of this is. I thought of a few things.
If you remove the linearray = parts_info.readlines(
line Python understands just using a for loop with an open file so that'd make this whole thing streaming in case your file's huge. Currently that line of code will try reading the entire file into memory at once, rather than going line by line, so you'll crash your computer if you have a file bigger than your memory.
You can also combine the if statements and do 1 conditional since you seem to only care about having both fields. In the interest of cleaner code you also don't need model_number = ''; model_name = ''
model_number = ''; model_name = ''
Saving the results of things like line.split(':')
can help.
Alternatively, you could try a regex. It's impossible to tell which one is going to perform better without testing both, which brings me back to what I was saying in the beginning: optimizing code is tricky and really shouldn't be done if not necessary. If you really, really cared about efficiency you would use a program like awk
written in C.
One straight way is using regex :
with open("textfile", 'r') as parts_info:
for line in parts_info:
m=re.search(r'[A-Z ]+ NUM\.: [A-Z\d]+',line)
if m:
print m.group(0)
result :
'PACKARDMODEL NUM.: 34401A',
' FLUKE MODEL NUM.: 724',
' AGILENT MODEL NUM.: U1253A',
' AGILENT MODEL NUM.: U1253B'
A few things are coming to my mind :
split(':')
once and reuse it :
is always the same then throw away the ifs and check with the length once I am finishing with something like this
parts_data = {'Model_Number': []}
with open("textfile.txt", 'r') as parts_info:
linearray = parts_info.readlines()
for line in linearray:
linesp = line.split(':')
if len(linesp)>2:
model_name = linesp[1]
model_number = linesp[2]
model_number = model_number.split()[0]
model_number = model_name + ' ' + model_number
parts_data['Model_Number'].append(model_number.rstrip())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.