简体   繁体   中英

Remove Chapter number from list of strings in Python

The program im writing is taking a list of strings as the input and is trying to remove the Chapter numbers. I have the function written but it currenly does not work. I have provided my function and a sample of the output! Thank you!

def remove_chapter(chapter_header):
    for i in range(101):
        chapters="Chapter " + str(i)
        chapter_text=[my_string.replace(chapters,"") for my_string in chapter_header]
    return chapter_text

Here is the current output with the not working function: Output

Since your strings have a similar patter that you need to remove, with a few variations ( chapter number ), its better to use python.re . With it, you have lot of flexibility in your pattern matching .

So, all you need to do :

>>> import re
>>> [ re.sub('Chapter \d+ ', '', string) for string in chapter_header ]

#driver values :

IN : chapter_header = ['Chapter 1 It is ...','However little ...','Chapter 12 Lorem Ipsum']

OUT : ['It is ...', 'However little ...', 'Lorem Ipsum']

Breaking it down, you pattern looks like :

'Chapter'<whitespace>[number/s]<whitespace>

So, whenever this pattern is found, the string is substituted, or if not found, just ignored.

Given a list of chapters, we can drop the chapter and numbers up to the first word in each chapter.

Given

import itertools as it


chapters = [
    "Chapter 1  It is a truth universally acknowledged ...",
    "Chapter 2  Mr. Bennet was among the earliest ...",
    "Chapter 3  Not all that Mrs. Bennet, however, with ...",
]

Code

pred = lambda x: (x == "Chapter") or x.isdigit() 
results = [list(it.dropwhile(pred, [word for word in chapter.split()])) for chapter in chapters]
results 

Output

[['It', 'is', 'a', 'truth', 'universally', 'acknowledged', '...'],
 ['Mr.', 'Bennet', 'was', 'among', 'the', 'earliest', '...'],
 ['Not', 'all', 'that', 'Mrs.', 'Bennet,', 'however,', 'with', '...']]

Details

The list comprehension splits the chapters into lists and the words within the lists. Equivalently:

for chapter in chapters:
    print([word for word in chapter.split()])

# ['Chapter', '1', 'It', 'is', 'a', 'truth', 'universally', 'acknowledged', '...']
# ['Chapter', '2', 'Mr.', 'Bennet', 'was', 'among', 'the', 'earliest', '...']
# ['Chapter', '3', 'Not', 'all', 'that', 'Mrs.', 'Bennet,', 'however,', 'with', '...']

Finally, itertools.dropwhile iterates each list and removes items until the predicate is not longer true. In other words, keep dropping items up to the first that is neither "Chapter" nor a digit.

The resulting chapters can be rejoined as strings if desired.

[" ".join(chapter) for chapter in results]
# ['It is a truth universally acknowledged ...',
#  'Mr. Bennet was among the earliest ...',
#  'Not all that Mrs. Bennet, however, with ...']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM