简体   繁体   中英

Run a python function for each line in a text file

Ok so I am trying to mass format a large text document to convert

#{'000','001','002','003','004','005','006','007','008','009'}

into

#{'000':'001','002':'003','004':'005','006':'007','008':'009'}

using python and have my function working, however it will only work if I run it line by line.

and was wondering how to get it to run for each line on my input so that it will work on a multi line document

with open("input") as core:
    a = core.read()

K = '!'
N = 12

res = ''
for idx, ele in enumerate(a):

    if idx % N == 0 and idx != 0:
        res = res + K
    else:
        res = res + ele

b = (str(res).replace(",",":").replace("!",","))

l = len(b) 
c = b[:l-1]
d = c + "}"

print(d)

here is the current result for a multiline text file

{'000':'001','002':'003','004':'005','006':'007','008':'009',
{'001':'00,':'003':'00,':'005':'00,':'007':'00,':'009':'00,'}
{'002':',03':'004':',05':'006':',07':'008':',09':'000':',01'}
{'003','004':'005','006':'007','008':'009','000':'001','002'}

So Far I have tried

with open('input', "r") as a:
    for line in a:

        K = '!'
        N = 12

        res = ''
        for idx, ele in enumerate(a):

            if idx % N == 0 and idx != 0:
                res = res + K
            else:
                res = res + ele

        b = (str(res))

        l = len(b) 
        c = b[:l-1]
        d = c + "}"

print(d)

but no luck

FOUND A SOLUTION

import re

with open("input") as core:
    coords = core.read()

sword = coords.replace("\n",",\n")

dung = re.sub('(,[^,]*),', r'\1 ', sword).replace(",",":").replace(" ",",").replace(",\n","\n")

print(dung)

I know my solution works, but i cant quite apply this to other situations where I am applying different formats based on the need. Its easy enough to work out how to format a single line of text as there is so much documentation out there.

Does anybody know of any plugins or particular python elements where you can write your format function and then apply it to all lines. like a kind of applylines() extension instead of readlines()

you can do this in this way:

# Read in the file
with open('input.txt', 'r') as file :
  filedata = file.read()

# Replace the target string
filedata = filedata.replace(',', ':')

# Write the file out again
with open('output.txt', 'w') as file:
  file.write(filedata)

You can separate the file contents per line and then apply the text processing functions on each line. After that just append the lines to the response output. The code would be

with open("input") as core:
    a = core.read()

K = '!'
N = 12
a = a.split("\n")
res = ''

for line in a:
  temp = ''
  for idx, ele in enumerate(line):
      if idx % N == 0 and idx != 0:
          temp = temp + K
      else:
          temp = temp  + ele
  temp = (str(temp).replace(",",":").replace("!",","))
  res = res+temp[:-1]+"}\n"
res = res[:-1]
print(res)

For the following input

{'000','001','002','003','004','005','006','007','008','009'}
{'000','001','002','003','004','005','006','007','008','009'}

the output would be:

{'000':'001','002':'003','004':'005','006':'007','008':'009'}
{'000':'001','002':'003','004':'005','006':'007','008':'009'}

I think I would base an answer off of turning your input data into a generator so that I could apply next() to it to take two items at a time.

def clean_line(line):
    items = iter(line.split(","))
    return ','.join(f'{item}:{next(items)}' for item in items)

With a method like clean_line() you might now use:

data = [
    "{'000','001','002','003','004','005','006','007','008','009'}",
    "{'000','001','002','003','004','005','006','007','008','009'}"
]
results = "\n".join(clean_line(line) for line in data)
print(results)

Or reading from a file as:

def clean_line(line):
    items = iter(line.strip("\n").split(","))
    return ','.join(f'{item}:{next(items)}' for item in items)

with open("data.txt", "r") as file_in:
    results = "\n".join(clean_line(line) for line in file_in.readlines())
print(results)

For your given example input, you can read the whole file at once using .read() , use a pattern to match the first comma, and capture in group 1 matching until the second comma.

In the replacement use : and a backreference to what is captured in group 1 using \1

,([^,\n]*,)?

The pattern in parts matches:

  • , Match a comma
  • ( Capture group 1
    • [^,\n]*, Optionally match any character except a comma or a newline, then match a comma
  • )? Close the capture group and make it optional

See a regex demo

For example:

import re

with open("input") as core:
    dung = re.sub(r",([^,\n]*,)?", r":\1", core.read())
    print(dung)

Ouput

#{'000':'001','002':'003','004':'005','006':'007','008':'009'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM