I have this block of code, and it goes through a text file, grabs it line by line and splits it up into separate words. This is all well and good, but within my text file, I have certain words and phrases that start with and end with '-', for example, '-foo-' or '-foo bar-'. Right now, they are being split up because of the code into '-foo' and 'bar-'. I understand why this is happening however.
My plan would be to grab those instances that start and end with '-' , store them into a separate list, then the user changes each of those phrases into something new, put them back into the list. How do I tell it to grab a certain phrase if it is two separate words?
def madLibIt(text_file):
listOfWords = [] #creates a word list
for eachLine in text_file: #go through eachLine, and split it into
#seperate words
listOfWords.extend(eachLine.split())
print listOfWords
Calling str.split()
without a separator splits the text by spaces, so you are not using -
as a delimiter.
You can use re.findall()
with the pattern (-.+?-)
:
matches = re.findall(r'(-.+?-)', 'This is a -string- with a -foo bar-')
print(matches) # ['-string-', '-foo bar-']
This regular expression grabs exactly what you want.
import re
s = 'This is a string with -parts like this- and -normal- parts -as well-'
print re.findall(r'((?:-\w[\w\s]*\w-)|(?:\b\w+\b))', s)
>>>
['This', 'is', 'a', 'string', 'with', '-parts like this-', 'and', '-normal-', 'parts', '-as well-']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.