I need to write a function which replaces multiple format strings into downcase.
For example, a paragraph contains a word 'something' in different formats like 'Something', 'SomeThing', 'SOMETHING', 'SomeTHing' need to convert all format words into downcase 'something'.
How to write a function with replacing with downcase?
You can split your paragraph into different words, then use the slugify module to generate a slug of each word, compare it with "something" , and if there is a match, replace the word with "something".
In [1]: text = "This paragraph contains Something, SOMETHING, AND SomeTHing"
In [2]: from slugify import slugify
In [3]: for word in text.split(" "): # Split the text using space, and iterate through the words
...: if slugify(unicode(word)) == "something": # Compare the word slug with "something"
...: text = text.replace(word, word.lower())
In [4]: text
Out[4]: 'This paragraph contains something, something AND something'
Split the text into single words and check whether a word in written in lower case is "something". If yes, then change the case to lower
if word.lower() == "something":
text = text.replace(word, "something")
To know how to split a text into words, see this question .
Another way is to iterate through single letters and check whether a letter is the first letter of "something":
text = "Many words: SoMeThInG, SOMEthING, someTHing"
for n in range(len(text)-8):
if text[n:n+9].lower() == "something": # check whether "something" is here
text = text.replace(text[n:n+9], "something")
print text
You can also use re.findall
to search and split the paragraph into words and punctuation, and replace all the different cases of "Something"
with the lowercase version:
import re
text = "Something, Is: SoMeThInG, SOMEthING, someTHing."
to_replace = "something"
words_punct = re.findall(r"[\w']+|[.,!?;: ]", text)
new_text = "".join(to_replace if x.lower() == to_replace else x for x in words_punct)
print(new_text)
Which outputs:
something, Is: something, something, something.
Note: re.findall
requires a hardcoded regular expression to search for contents in a string. Your actual text may contain characters that are not in the regular expression above, you will need to add these as needed.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.