简体   繁体   中英

Python, Remove word start with specific character

How can I remove a word that starts with a specific char in python?

eg.

string = 'Hello all please help #me'

I want to remove the word that starts with #

the result I want is:

Hello all please help 
>>> a = "Hello all please help #me "
>>> filter(lambda x:x[0]!='#', a.split())
['Hello', 'all', 'please', 'help']

you can join it using whitespace:

>>> " ".join(filter(lambda x:x[0]!='#', a.split()))
'Hello all please help'

let me explain you step by step:

>>> a = "Hello all please help #me "
>>> a.split()                          # split, splits the string on delimiter, by default its whitespace
['Hello', 'all', 'please', 'help', '#me']
>>> >>> filter(lambda x:x[0]!='#', a.split())
['Hello', 'all', 'please', 'help']

filter return only those element for which condition is True.

One problem with using split here is that it removes whitespace. For example,

In [114]: 'a  b \tc\nd'.split()
Out[114]: ['a', 'b', 'c', 'd']

So joining it back together again with ' '.join alters the original string:

In [115]: ' '.join('a  b \tc\nd'.split())
Out[115]: 'a b c d'

If you want to preserve the original string and just remove words that begin with # , then you could use regex:

In [119]: import re

In [120]: re.sub(r'(\s)#\w+', r'\1', 'Hello all please help #me   but#notme')
Out[120]: 'Hello all please help    but#notme'

Explanation :

https://regex101.com has a handy tool to assist you in understanding regular expressions. For example, here is its explanation for what "(\\s)#\\w+" means:

1st Capturing group (\s)
    \s match any white space character [\r\n\t\f ]
# matches the character # literally
\w+ match any word character [a-zA-Z0-9_]
    Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]

Since this regex pattern begins by matching whitespace, ' #me' matches, but 'but#notme' does not.

The second argument to re.sub , r'\\1' , is the replacement pattern. The \\1 tells re.sub to replace the match with the first capturing group. So the match ' #me' is replaced with a space ' ' .

Using the most obvious solution:

txt = 'Hello all please help #me'
# better to not use 'string' as variable name

' '.join(word for word in txt.split(' ') if not word.startswith('#'))

Note, that in this case it might be better to use split(' ') with explicit space as separator, in contrast to parameterless split() that is more common. This way you won't lose newlines or multiple spaces.

我会做这样的事情:

' '.join(word for word in "help #me please".split() if word[0]!='#')

As an addition to unutbu's answer , to catch the occurrence at the start of the sentence

> re.sub(r'(\s)#\w+', r'\1', '#Hello all please help #me   but#notme')
 '#Hello all please help    but#notme'

> re.sub(r'(\s)#\w+', r'\1', '#Hello all please help #me   but#notme')
 'all please help    but#notme'

didn't have enough rep to comment

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM