简体   繁体   中英

extracting positions from list based on values in items

I am relatively new to using python. I am trying to take a standard file format and eventually break it out into smaller files based on a certain identifier that appears on a line.

I have so far been able to take the file, open it to read and write, then broken out each line into a list item. Now I am trying to locate each list item position that starts with '03'. Everything from one '03' list position to another is what will eventually be a separate file. I am stuck in trying to extract the list positions where the list value contains '03'. I have tried using:

for value in acct_locate:
    if value == '03':
        locations.append(acct_locate.index(value))

This seems to be returning nothing, and I have tried some other versions of enumerate() and index() .

Currently here is my code that I am working with:

import re
#need to look for file name
filename = 'examplebai2.txt'

#this list will store all locations where three record shows up
acct_locate = []
locations = []
acct_listing = []

with open(filename, 'r+') as file:
    line = [line.rstrip('\n') for line in file]
    for x in line:
        #locate all instances of locations starting with '03'
        look = re.findall('^03', x)
        acct_locate.append(look)
        #add those instances to a new list
    a = [i for i,x in enumerate(acct_locate) if x == '03']
    for value in a:
        print(value)
        locations.append(acct_locate.index(value))
    for y in line:
        namelist = re.findall('^03, (.*),', y)
        if len(namelist) > 0:
            acct_listing.append(namelist)

Running the above code will return nothing to the locations list that I am using to gather all of the locations.

Here is a skeleton of the file I am trying to manipulate.

01, Testfile
02, Grouptest
03, 11111111
16
88
49
03, 22222222,
16
88
49
03, 33333333,
16
88
49
03, 44444444,
16
88
49
98, Grouptestclose
99, Testfileclose

From this file I would want to end with four separate files that contain from one 03 record up to the next 03 record.

If you do not need to know the positions of your special characters you could do:

with open('examplebai2.txt', 'r') as file:
    data = file.read().replace('\n', ' ')

data = data.split('03')

explanation : the first two statements read the file, remove all newline characters and put the result into a single string "data". The last statement splits the string on occurences of your "special character" '03', returning a list of strings where each element is a piece between two '03'.

EDIT:

Given the example data above, you could try to loop over the file and put the read data into a buffer. Every time you find a '03', empty the buffer into a new file. Example:

buffer = ""
new_file_counter = 0
with open(filename,'r+') as file:
    ## loop over lines
    for x in file:
        if x.split(',')[0] == '03':
            with open('out_file_{}'.format(new_file_counter)) as out:
                out.write(buffer)
                buffer = ""
                new_file_counter = 0
        buffer += x


如果您想“定位以 '03' 开头的所有位置实例”,您应该检查x.startswith("03")而不是x == "03"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM