So, I am working with a text file on which I am doing the following operations on the string
def string_operations(string):
1) lowercase
2) remove integers from string
3) remove symbols
4) stemming
After this, I am still left with strings like:
durham 28x23
I see the flaw in my approach but would like to know if there is a good, fast way to identify if there is a numeric value attached with the string.
So in the above example, I want the output to be
durham
Another example:
21st ammendment
Should give:
ammendment
So how do I deal with this stuff?
If you requirement is, "remove any terms that start with a digit", you could do something like this:
def removeNumerics(s):
return ' '.join([term for term in s.split() if not term[0].isdigit()])
This splits the string on whitespace and then joins with a space all the terms that do not start with a number.
And it works like this:
>>> removeNumerics('21st amendment')
'amendment'
>>> removeNumerics('durham 28x23')
'durham'
If this isn't what you're looking for, maybe show some explicit examples in your questions (showing both the initial string and your desired result).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.