Any recommendation on how I can grab data from several text files and process them (compute totals for example). I have been trying to do it in Python but keep on encountering dead ends.
A machine generates a summary file in text format each time you do an operation, for this example, screening good apples from batches. First you load the apples, then good is separated from bad, and then you can reload the bad apples again to retest them and some are recovered. so at least 2 summary file is generated per batch, depending on how many times you load the apples to recover good.
This is an example of the text file:
file1:
general Info:
Batch No. : A2J3
Operation : Test
Fruit : Apple
Operation Number : A5500
Quantity In : 10
yield info:
S1 S2 Total Bin Name
5 2 7 good
1 2 3 bad
file2:
general Info:
Batch No. : A2J3
Operation : Test
Fruit : Apple
Operation Number : A5500
Quantity In : 3
yield info:
S1 S2 Total Bin Name
1 1 2 good
0 0 1 bad
I want to get the data in a folder full of these txt files and merge the testing results with the following criteria:
process the same batch by identifying which txt files are coming from the same Batch No., same operation (based on the txt file's content not filename)
merge the 2 (or more summary file) data into the following format csv:
Lot: Operation: Bin First Pass Second Pass Final Yield %Yield Good 7 2 9 90% Bad 3 1 1 10%
S1, S2 is variable, it can go from 1 to 14 but never less than 1. The bins can also have several types on different text files (not only limited to good and bad. but there will always be only 1 good bin)
Bins:
Good
Semi-bad
Bad
Worst
...
I'm new to Python and I only used this scripting language at school, I only know the very basics, nothing more. So this task I want to do is a bit overwhelming to me so I started to process a single text file and get the data I wanted, eg: Batch Number
with open('R0.txt') as fh_d10SunFile:
fh_d10SumFile_perline = fh_d10SunFile.read().splitlines()
#print fh_d10SumFile_perline
TestProgramName_str = fh_d10SumFile_perline[CONST.TestProgram_field].split(':')[1]
LotNumber_str = fh_d10SumFile_perline[CONST.LotNumber_field].split(':')[1]
QtyIn_int = int( fh_d10SumFile_perline[CONST.UnitsIn_field].split(':')[1] )
TestIteration_str = fh_d10SumFile_perline[CONST.TestIteration_field].split(':')[1]
TestType_str = fh_d10SumFile_perline[CONST.TestType_field].split(':')[1]
then grab all the bins in that summary file:
SoftBins_str = filter( lambda x: re.search(r'bin',x),fh_d10SumFile_perline)
for index in range( len(SoftBins_str) ):
SoftBins_data_str = [l.strip() for l in SoftBins_str[index].split(' ') if l.strip()]
SoftBins_data_str.reverse()
bin2bin[SoftBins_data_str[0]] = SoftBins_data_str[2]
then i got stuck because i'm not sure how to do this reading and parsing with several n number of text files containing n number of sites (S1, S2). How do I grab these information from n number of text files, process them in memory (is this even possible with python) and then write the output with computation on the csv output file.
The following should help get you started. As your text files are fixed format, it is relatively simple to read them in and parse them. This script searches for all text files in the current folder, reads each file in and stores the batches in a dictionary based on the batch name so that all batches of the same name area grouped together.
After all files are processed, it creates summaries for each batch and writes them to a single csv output file.
from collections import defaultdict
import glob
import csv
batches = defaultdict(list)
for text_file in glob.glob('*.txt'):
with open(text_file) as f_input:
rows = [row.strip() for row in f_input]
header = [rows[x].split(':')[1].strip() for x in range(1, 6)]
bins = {}
for yield_info in rows[8:]:
s1, s2, total, bin_name = yield_info.split()
bins[bin_name] = [int(s1), int(s2), int(total)]
batches[header[0]].append(header + [bins])
with open('output.csv', 'wb') as f_output:
csv_output = csv.writer(f_output, delimiter='\t')
for batch, passes in batches.items():
bins_output = defaultdict(lambda: [[], 0])
total_yield = 0
for lot, operation, fruit, op_num, quantity, bins in passes:
for bin_name, (s1, s2, total) in bins.iteritems():
bins_output[bin_name][0].append(total)
bins_output[bin_name][1] += total
total_yield += total
csv_output.writerows([['Lot:', lot], ['Operation:', operation]])
csv_header = ["Bin"] + ['Pass {}'.format(x) for x in range(1, 1 + len(passes))] + ["Final Yield", "%Yield"]
csv_output.writerow(csv_header)
for bin_name in sorted(bins_output.keys()):
entries, total = bins_output[bin_name]
percentage_yield = '{:.1f}%'.format((100.0 * total) / total_yield)
csv_output.writerow([bin_name] + entries + [total, percentage_yield])
csv_output.writerow([]) # empty row to separate batches
Giving you a tab delimited csv
file as follows:
Lot: A2J3
Operation: Test
Bin Pass 1 Pass 2 Final Yield %Yield
Bad 3 1 4 30.8%
Good 7 2 9 69.2%
Note, script has been updated to deal with any number of bin types.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.