简体   繁体   中英

How do you cut text from a line and append it to another line in text file using python?

Imagine you have some text that you want to split into chunks and send to separate files, using Son Huang's solution based on l'mahdi's solution

Suppose the given text is modified such that the lines starting with note:: have some additional text before a comma, and each chunk of text has another line, starting with highlight:: :

INPUT

company:: acme products
department:: sales
floor:: 1

name:: Joe Blogs 
phone:: 123456789
email:: joeblogs@email.com
address:: 123 Main Street
note:: highlight text, blah blah blah
timestamp::
highlight::

name:: Josephine Blogs 
phone:: 43217890
email:: josephineblogs@email.com
address:: 123 Main Street
note:: Another highlight here, More blah blah
timestamp::
highlight::

name:: John Smith 
phone:: 23498689
email:: johnsmith@email.com
address:: 1 North Street
note:: Amazing text, Some more blah
timestamp::
highlight::

What needs to be added to Son Huang's solution to get the following result? You can see that the text before the comma on the line starting with notes:: now appears on the line starting with highlight:: (and the comma is gone)

DESIRED OUTPUT

# chunk_1.txt

name:: Joe Blogs
phone:: 123456789
email:: joeblogs@email.com
address:: 123 Main Street
note:: blah blah blah
timestamp:: 2022-08-07 (13h 10m 08s)
highlight:: highlight text
company:: acme products
department:: sales
floor:: 1

# chunk_2.txt

name:: Josephine Blogs
phone:: 43217890
email:: josephineblogs@email.com
address:: 123 Main Street
note:: More blah blah
timestamp:: 2022-08-07 (13h 10m 09s)
highlight:: Another highlight here
company:: acme products
department:: sales
floor:: 1

# chunk_3.txt

name:: John Smith
phone:: 23498689
email:: johnsmith@email.com
address:: 1 North Street
note:: Some more blah
timestamp:: 2022-08-07 (13h 10m 10s)
highlight:: Amazing text
company:: acme products
department:: sales
floor:: 1

Ideally, you would want to parse the data properly, and edit it that way. If you just want a quick and dirty solution, this would work though.

You could loop over each line and check if it starts with 'note:: ' , then split it based on the first comma.

I'm assuming that your data structures are unordered, so it's ok if I output the properties in a slightly different order.

for line in file:
    if line.startswith('note:: '):
        highlight, remainder = line.split(', ', 1)
        highlight = highlight.removeprefix('note:: ')
        # Write note and highlight as separate lines
        output(f'note:: {remainder}')
        output(f'highlight:: {highlight}')
    elif line.startswith('highlight::'):
        # Skip the original highlights
        pass
    else:
        output(line)

In this case, output should be replaced to match the function you're using to write to your output file.

Keep in mind that this code isn't super robust though - if you want this to be reliable you should definitely create a system for parsing this data properly.

This code snippet should work fine for you. Do optimize the solution as per your convenience:

from datetime import datetime
import time
import re

with open('input.txt') as f:
    header, content = f.read().split('\n\n', maxsplit=1)
    for n, chunk in enumerate(content.split('\n\n'), start=1):
        timestamp = datetime.now().strftime('%Y-%m-%d (%Hh %Mm %Ss)')
        chunk = re.sub(r'(timestamp::)', fr'\1 {timestamp}', chunk)
        
        substitute1, substitute2, substitute3 = ("note:: ", "\n", "highlight::")
        idx1, idx2 = chunk.find(substitute1), chunk.find(substitute2, chunk.find(substitute1))
        text_chunk = chunk[idx1 + len(substitute1): idx2] #.split('\n')
        lst_chunk = text_chunk.split(',')
        chunk = re.sub(text_chunk, '', chunk)
        chunk = re.sub(r'(' + substitute1 + ')', fr'\1 {lst_chunk[1].strip()}', chunk)
        chunk = re.sub(r'(' + substitute3 + ')', fr'\1 {lst_chunk[0].strip()}', chunk)
        
        chunk = chunk.strip() + '\n' + header
        with open(f'chunk_{n}.txt', 'w') as f_out:
            f_out.write(chunk)
        time.sleep(1)

Output:

#chunk1.txt
name:: Joe Blogs 
phone:: 123456789
email:: joeblogs@email.com
address:: 123 Main Street
note::  blah blah blah
timestamp:: 2022-08-07 (13h 56m 52s)
highlight:: highlight text
company:: acme products
department:: sales
floor:: 1

#chunk2.txt
name:: Josephine Blogs 
phone:: 43217890
email:: josephineblogs@email.com
address:: 123 Main Street
note::  More blah blah
timestamp:: 2022-08-07 (13h 56m 53s)
highlight:: Another highlight here
company:: acme products
department:: sales
floor:: 1

#chunk3.txt
name:: John Smith 
phone:: 23498689
email:: johnsmith@email.com
address:: 1 North Street
note::  Some more blah
timestamp:: 2022-08-07 (13h 56m 54s)
highlight:: Amazing text
company:: acme products
department:: sales
floor:: 1

Updating Son Hoang solution

from datetime import datetime
import time
import re

with open('test.txt') as f:
    header, content = f.read().split('\n\n', maxsplit=1)
    for n, chunk in enumerate(content.split('\n\n'), start=1):
        timestamp = datetime.now().strftime('%Y-%m-%d (%Hh %Mm %Ss)')
        chunk = re.sub(r'(timestamp::)', fr'\1 {timestamp}', chunk)

        # Regex partition of 3 groups for line note::: i.e. 1. (note:::\s+), 2. ([^,]+) and 3. (.*)
        note = re.search(r'(note::\s+)([^,]+),(.*)', chunk)

        # Note without 2nd group (i.e. \1 & \2 only)
        chunk = re.sub(r'(note::\s+)([^,]+),(.*)', fr'\1\3', chunk)

        # Add 2nd group from note::: to highlight
        chunk = re.sub(r'(highlight::)', fr'\1{note.group(2)}', chunk)

        chunk = chunk.strip() + '\n' + header
        print(chunk)
        print()
        with open(f'chunk_{n}.txt', 'w') as f_out:
            f_out.write(chunk)
        time.sleep(1)

Output File: chunk1.txt

name:: Joe Blogs 
phone:: 123456789
email:: joeblogs@email.com
address:: 123 Main Street
note::  blah blah blah
timestamp:: 2022-08-07 (04h 59m 56s)
highlight::highlight text
company:: acme products
department:: sales
floor:: 1

File: chunk2.txt

name:: Josephine Blogs 
phone:: 43217890
email:: josephineblogs@email.com
address:: 123 Main Street
note::  More blah blah
timestamp:: 2022-08-07 (04h 59m 57s)
highlight::Another highlight here
company:: acme products
department:: sales
floor:: 1

File: chunk3.txt

name:: John Smith 
phone:: 23498689
email:: johnsmith@email.com
address:: 1 North Street
note::  Some more blah
timestamp:: 2022-08-07 (04h 59m 58s)
highlight::Amazing text
company:: acme products
department:: sales
floor:: 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM