简体   繁体   中英

To Split text based on words using python code

I have a long text like the one below. I need to split based on some words say ("In","On","These")

Below is sample data:

On the other hand, we denounce with righteous indignation and dislike men who are so beguiled and demoralized by the charms of pleasure of the moment, so blinded by desire, that they cannot foresee the pain and trouble that are bound to ensue; and equal blame belongs to those who fail in their duty through weakness of will, which is the same as saying through shrinking from toil and pain. These cases are perfectly simple and easy to distinguish. In a free hour, when our power of choice is untrammelled and when nothing prevents our being able to do what we like best, every pleasure is to be welcomed and every pain avoided. But in certain circumstances and owing to the claims of duty or the obligations of business it will frequently occur that pleasures have to be repudiated and annoyances accepted. The wise man therefore always holds in these matters to this principle of selection: he rejects pleasures to secure other greater pleasures, or else he endures pains to avoid worse pains.

Can this problem be solved with a code as I have 1000 rows in a csv file.

根据我的评论,我认为一个不错的选择是将正则表达式与模式一起使用:

 re.split(r'(?<!^)\b(?=(?:On|In|These)\b)', YourStringVariable)

Yes this can be done in python. You can load the text into a variable and use the built in Split function for string. For example:

with open(filename, 'r') as file:
    lines = file.read()
    lines = lines.split('These')
    # lines is now a list of strings split whenever 'These' string was encountered

To find whole words that are not part of larger words, I like using the regular expression: [^\\w]word[^\\w]

Sample python code, assuming the text is in a variable named text :

import re
exp = re.compile(r'[^\w]in[^\w]', flags=re.IGNORECASE)
all_occurrences = list(exp.finditer(text))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM