简体   繁体   中英

Removing numbers from strings

So, I am working with a text file on which I am doing the following operations on the string

     def string_operations(string):

        1) lowercase
        2) remove integers from string
        3) remove symbols
        4) stemming

After this, I am still left with strings like:

  durham 28x23

I see the flaw in my approach but would like to know if there is a good, fast way to identify if there is a numeric value attached with the string.

So in the above example, I want the output to be

  durham

Another example:

 21st ammendment

Should give:

ammendment

So how do I deal with this stuff?

If you requirement is, "remove any terms that start with a digit", you could do something like this:

def removeNumerics(s):
  return ' '.join([term for term in s.split() if not term[0].isdigit()])

This splits the string on whitespace and then joins with a space all the terms that do not start with a number.

And it works like this:

>>> removeNumerics('21st amendment')
'amendment'
>>> removeNumerics('durham 28x23')
'durham'

If this isn't what you're looking for, maybe show some explicit examples in your questions (showing both the initial string and your desired result).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM