简体   繁体   中英

Grab certain words and phrases from a text file in Python

I have this block of code, and it goes through a text file, grabs it line by line and splits it up into separate words. This is all well and good, but within my text file, I have certain words and phrases that start with and end with '-', for example, '-foo-' or '-foo bar-'. Right now, they are being split up because of the code into '-foo' and 'bar-'. I understand why this is happening however.

My plan would be to grab those instances that start and end with '-' , store them into a separate list, then the user changes each of those phrases into something new, put them back into the list. How do I tell it to grab a certain phrase if it is two separate words?

def madLibIt(text_file):
    listOfWords = [] #creates a word list
    for eachLine in text_file: #go through eachLine, and split it into 
        #seperate words
        listOfWords.extend(eachLine.split())
 print listOfWords

Calling str.split() without a separator splits the text by spaces, so you are not using - as a delimiter.

You can use re.findall() with the pattern (-.+?-) :

matches = re.findall(r'(-.+?-)', 'This is a -string- with a -foo bar-')
print(matches) # ['-string-', '-foo bar-']

This regular expression grabs exactly what you want.

import re

s = 'This is a string with -parts like this- and -normal- parts -as well-'

print re.findall(r'((?:-\w[\w\s]*\w-)|(?:\b\w+\b))', s)

>>> 
['This', 'is', 'a', 'string', 'with', '-parts like this-', 'and', '-normal-', 'parts', '-as well-']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM