简体   繁体   中英

How would I extract & organize data from a txt file using python?

1st Time Post here! Love this site.

Situation : I have a flat file of data with various elements in it and I need to extract specific portions. I am a beginner in Python and wrote it out using Regular Expressions and other functions. Here is a sample of the data from the txt file I receive:


ACCESSORID = FS01234 TYPE = USER SIZE = 1024 BYTES NAME = JOHN SMITH FACILITY = TSO
DEPT ACID = D12RGRD DEPARTMENT = TRAINING
DIV ACID = NR DIVISION = NRE
CREATED = 01/17/05 00:00 LAST MOD = 11/16/21 10:42
PROFILES = VPSNRE P11NR00A
LAST USED = 12/02/21 09:03 CPU(SYSB) FAC(SUPRSESS) COUNT(06051)
XA SSN = 123456789 OWNER(JB112)
XA TSOACCT = 123456789 OWNER(JB112 )
XA TSOAUTH = JCL OWNER(JB112 )
XA TSOAUTH = RECOVER OWNER(JB112 )
XA TSOPROC = NR005PROC OWNER(JB112 )
----------- SEGMENT TSO
TRBA = NON-DISPLAY FIELD
TSOCOMMAND =
TSODEFPRFG =
TSOLACCT = 111111111
TSOLPROC = NR9923PROC
TSOLSIZE = 0004096
TSOOPT = MAIL,NONOTICES,NOOIDCARD
TSOUDATA = 0000
TSOUNIT = SYSDD
TUPT = NON-DISPLAY FIELD
----------- SEGMENT USER EMAIL ADDR = john.smith@nre.ago.com

The portions I need to extract are bolded. I know I need to provide what I have done so far and without posting my entire script, here is what I am doing to extract the ACCESSORID = FS01234 and NAME = JOHN SMITH portion.

def RemoveSpace():
    f = open("PROJECTFILE.txt","r")
    f1 = open("RemoveSpace.txt", "w")
    data1 = f.read()
    word = data1.split()
    s = ' '.join(word)
    f1.write(s)
    print("Data Written Successfully")
    RemoveSpace()


f = open(r"C:\Users\user\Desktop\HR\PROJECTFILE\RemoveSpace.txt".format(g), "r").read()

TSS = []

 contents = re.split(r"ACCESSORID =",f)
 contents.pop(0)

for item in contents:
TSS_DICT = {}

emplid = re.search(r"FS.*", item)

if emplid is not None:
    s_emplid = re.search("FS\w*", emplid.group())
else:
    s_emplid = None
    
if s_emplid is not None:
    s_emplid = s_emplid.group()
else:
    s_emplid = None

TSS_DICT["EMPLOYEE ID"] = s_emplid

name = re.search(r"NAME =.*", item)

if name is not None:
    emp_name = re.search("[^NAME = ][^,]*", name.group())
else:
    emp_name = None

if emp_name is not None:
    emp_name = emp_name.group()
else:
    emp_name = None

TSS_DICT["EMPLOYEE NAME"] = emp_name

Question: Ok sorry for the lengthy post. I am having some difficulty getting John Smith . It keeps bringing in everything after John Smith down to very last line of email address. My end goal is to get a CSV file with each bolded item as its own column. And more directly speaking, how would experts approach this data clean up approach to simplify the process ? If needed I can post full code but didn't want to muddle this up anymore than needed.

I really appreciate any time and consideration that you could afford.

JB

For practising your Regex, I recommend using a website like RegExr . Here, you can paste the text that you want to match and you can play around with different matching expressions to get the result that you intend.

Assuming that you want to use this code for multiple files of the same organisation and that the data is formatted the same way in each, you can simplify your code a lot.

Let's say we wanted to extract NAME = JOHN SMITH from the text file. We could write the following Python code to do this:

import re
pattern = "NAME = \\w+ \\w+"
name = re.findall(pattern, text_to_search)[0][7:]
print(name)

pattern is our Regex search expression. text_to_search is your text file that you have read into your Python script. re.findall() returns a list of matched items that we then access the first index of with [0] . We can then use string slicing ( [7:] ) to remove the NAME = bit.

The above code would output the following:

JOHN SMITH

You should be able to apply the same principles to the other bold sections of your text file.

In terms of writing your extracted data out to a CSV file, it is probably worth reading a good tutorial on this. For example Reading and Writing CSV Files in Python . There are a few different ways of storing your information before writing, such as lists vs dictionaries. But you can write CSV files either with built-in Python tools or manually.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM