The program im writing is taking a list of strings as the input and is trying to remove the Chapter numbers. I have the function written but it currenly does not work. I have provided my function and a sample of the output! Thank you!
def remove_chapter(chapter_header):
for i in range(101):
chapters="Chapter " + str(i)
chapter_text=[my_string.replace(chapters,"") for my_string in chapter_header]
return chapter_text
Here is the current output with the not working function: Output
Since your strings
have a similar patter that you need to remove, with a few variations ( chapter number
), its better to use python.re
. With it, you have lot of flexibility in your pattern matching
.
So, all you need to do :
>>> import re
>>> [ re.sub('Chapter \d+ ', '', string) for string in chapter_header ]
#driver values :
IN : chapter_header = ['Chapter 1 It is ...','However little ...','Chapter 12 Lorem Ipsum']
OUT : ['It is ...', 'However little ...', 'Lorem Ipsum']
Breaking it down, you pattern looks like :
'Chapter'<whitespace>[number/s]<whitespace>
So, whenever this pattern is found, the string is substituted, or if not found, just ignored.
Given a list of chapters, we can drop the chapter and numbers up to the first word in each chapter.
Given
import itertools as it
chapters = [
"Chapter 1 It is a truth universally acknowledged ...",
"Chapter 2 Mr. Bennet was among the earliest ...",
"Chapter 3 Not all that Mrs. Bennet, however, with ...",
]
Code
pred = lambda x: (x == "Chapter") or x.isdigit()
results = [list(it.dropwhile(pred, [word for word in chapter.split()])) for chapter in chapters]
results
Output
[['It', 'is', 'a', 'truth', 'universally', 'acknowledged', '...'],
['Mr.', 'Bennet', 'was', 'among', 'the', 'earliest', '...'],
['Not', 'all', 'that', 'Mrs.', 'Bennet,', 'however,', 'with', '...']]
Details
The list comprehension splits the chapters into lists and the words within the lists. Equivalently:
for chapter in chapters:
print([word for word in chapter.split()])
# ['Chapter', '1', 'It', 'is', 'a', 'truth', 'universally', 'acknowledged', '...']
# ['Chapter', '2', 'Mr.', 'Bennet', 'was', 'among', 'the', 'earliest', '...']
# ['Chapter', '3', 'Not', 'all', 'that', 'Mrs.', 'Bennet,', 'however,', 'with', '...']
Finally, itertools.dropwhile
iterates each list and removes items until the predicate is not longer true. In other words, keep dropping items up to the first that is neither "Chapter"
nor a digit.
The resulting chapters can be rejoined as strings if desired.
[" ".join(chapter) for chapter in results]
# ['It is a truth universally acknowledged ...',
# 'Mr. Bennet was among the earliest ...',
# 'Not all that Mrs. Bennet, however, with ...']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.