简体   繁体   English

删除带有特殊字符“\\”和“/”的单词

[英]Removing words with special characters "\" and "/"

During the analysis of tweets, I run in the "words" that have either \\ or / (could have more than one appearance in one "word").在推文分析过程中,我运行了包含 \\ 或 / 的“词”(在一个“词”中可能出现不止一次)。 I would like to have such words removed completely but can not quite nail this我想完全删除这些词,但不能完全确定

This is what I tried:这是我尝试过的:

sen = 'this is \re\store and b\\fre'
sen1 = 'this i\s /re/store and b//fre/'

slash_back =  r'(?:[\w_]+\\[\w_]+)'
slash_fwd = r'(?:[\w_]+/+[\w_]+)'
slash_all = r'(?<!\S)[a-z-]+(?=[,.!?:;]?(?!\S))'

strt = re.sub(slash_back,"",sen)
strt1 = re.sub(slash_fwd,"",sen1)
strt2 = re.sub(slash_all,"",sen1)
print strt
print strt1
print strt2

I would like to get:我想得到:

this is and
this i\s and
this and

However, I receive:但是,我收到:

and 
this i\s / and /
i\s /re/store  b//fre/

To add: in this scenario the "word" is a string separated either by spaces or punctuation signs (like a regular text)添加:在这种情况下,“单词”是由空格或标点符号分隔的字符串(如常规文本)

How's this?这个怎么样? I added some punctuation examples:我添加了一些标点符号示例:

import re

sen = r'this is \re\store and b\\fre'
sen1 = r'this i\s /re/store and b//fre/'
sen2 = r'this is \re\store, and b\\fre!'
sen3 = r'this i\s /re/store, and b//fre/!'

slash_back =  r'\s*(?:[\w_]*\\(?:[\w_]*\\)*[\w_]*)'
slash_fwd = r'\s*(?:[\w_]*/(?:[\w_]*/)*[\w_]*)'
slash_all = r'\s*(?:[\w_]*[/\\](?:[\w_]*[/\\])*[\w_]*)'

strt = re.sub(slash_back,"",sen)
strt1 = re.sub(slash_fwd,"",sen1)
strt2 = re.sub(slash_all,"",sen1)
strt3 = re.sub(slash_back,"",sen2)
strt4 = re.sub(slash_fwd,"",sen3)
strt5 = re.sub(slash_all,"",sen3)
print(strt)
print(strt1)
print(strt2)
print(strt3)
print(strt4)
print(strt5)

Output:输出:

this is and
this i\s and
this and
this is, and!
this i\s, and!
this, and!

One way you could do it without re is with join and a comprehension.不用re就可以做到的一种方法是join和理解。

sen = 'this is \re\store and b\\fre'
sen1 = 'this i\s /re/store and b//fre/'

remove_back = lambda s: ' '.join(i for i in s.split() if '\\' not in i)
remove_forward = lambda s: ' '.join(i for i in s.split() if '/' not in i)

>>> print(remove_back(sen))
this is and
>>> print(remove_forward(sen1))
this i\s and
>>> print(remove_back(remove_forward(sen1)))
this and

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM