简体   繁体   中英

Python, Split the input string on elements of other list and remove digits from it

I have had some trouble with this problem, and I need your help. I have to make a Python method (mySplit(x)) which takes an input list (which only has one string as element), split that element on the elements of other list and digits. I use Python 3.6 So here is an example:

l=['I am learning']
l1=['____-----This4ex5ample---aint___ea5sy;782']
banned=['-', '+' , ',', '#', '.', '!', '?', ':', '_', ' ', ';']

The returned lists should be like this:

mySplit(l)=['I', 'am', 'learning']
mySplit(l1)=['This', 'ex', 'ample', 'aint', 'ea', 'sy']

I have tried the following, but I always get stuck:

def mySplit(x):

    l=['-', '+' , ',', '#', '.', '!', '?', ':', '_', ';'] #Banned chars
    l2=[i for i in x if i not in l] #Removing chars from input list
    l2=",".join(l2)
    l3=[i for i in l2 if not i.isdigit()] #Removes all the digits
    l4=[i for i in l3 if i is not ',']
    l5=[",".join(l4)]
    l6=l5[0].split(' ')
    return l6

and

mySplit(l1)
mySplit(l)

returns:

['T,h,i,s,e,x,a,m,p,l,e,a,i,n,t,e,a,s,y']
['I,', ',a,m,', ',l,e,a,r,n,i,n,g']

Use re.split() for this task:

import re
w_list = [i for i in re.split(r'[^a-zA-Z]', 
          '____-----This4ex5ample---aint___ea5sy;782') if i ]

Out[12]: ['This', 'ex', 'ample', 'aint', 'ea', 'sy']

I would import the punctuation marks from string and proceed with regular expressions as follows.

l=['I am learning']
l1=['____-----This4ex5ample---aint___ea5sy;782']
import re
from string import punctuation
punctuation # to see the punctuation marks.

>>> '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

' '.join([re.sub('[!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\d]',' ', w) for w in l]).split()

Here is the output:

>>>   ['I', 'am', 'learning']

Notice the \\d attached at the end of the punctuation marks to remove any digits.

Similarly,

' '.join([re.sub('[!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\d]',' ', w) for w in l1]).split() 

Yields

>>> ['This', 'ex', 'ample', 'aint', 'ea', 'sy']

You can also modify your function as follows:

def mySplit(x):

    banned = ['-', '+' , ',', '#', '.', '!', '?', ':', '_', ';'] + list('0123456789')#Banned chars
    return ''.join([word if not word in banned else ' ' for word in list(x[0]) ]).split()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM