As the title mentions, my issue is that I don't understand quite how to extract the data I need for my table (The columns for the table I need are Date, Time, Courtroom, File Number, Defendant Name, Attorney, Bond, Charge, etc.) I think regex is what I need but my class did not go over this, so I am confused on how to parse in order to extract and output the correct data into an organized table...
I am supposed to turn my text file from this
and export it into a more readable format like this- example output is below
Here is what I have so far.
''' def readFile(court): csv_rows = [] #read and split txt file into pages & chunks of data by pagragraph with open(court, 'r') as file: data_chunks=file.read().split("\n\n")
for chunk in data_chunks:
chunk=chunk.strip #.strip removes useless spaces
if str(data_chunks[:4]).isnumeric(): # if first 4 characters are digits
entry= None #initialize an empty dictionary
elif str(data_chunks).isspace() and entry: #if we're on an empty line and the entry dict is not empty
csv_rows.DictWriter(dialect='excel') # turn csv_rows into needed output
entry={}
else:
# parse here?
print(data_chunks)
return csv_rows
readFile(exactfilepath)
#end of code?
It is quite a lot of work to achieve that, but it is possible. If you split it in a couple of sub-tasks. First, your input looks like a text file so you could parse it line by line. -- using https://www.w3schools.com/python/ref_file_readlines.asp
Then, I noticed that your data can be split in pages. You would need to prepare a lot of regular expressions, but you can start with one for identifying where each page starts. -- you may want to read this as your expression might get quite complicated: https://www.w3schools.com/python/python_regex.asp The goal of this step is to collect all lines from a page in some container (might be a list, dict, whatever you find it suitable).
And afterwards, write some code that parses the information page by page. But for simplicity I suggest to start with something easy, like the columns for "no, file number and defendant".
And when you got some data in a reliable manner, you can address the export part, using pandas: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.