[英]How to detect string suffixes and remove these suffixed elements from list? - Python
How to detect string suffixes and remove these suffixed elements from list? 如何检测字符串后缀并从列表中删除这些后缀元素? I understand that this looks like an NLP, stemming/lemmatization task but the task requires a simpler function.
我了解这看起来像是NLP,词干/词形化任务,但该任务需要更简单的功能。
Given, i need to remove elements that has s
and es
suffixes if the non-suffixed items exist in the list: 给定,如果列表中存在未加后缀的项目,则需要删除具有
s
和es
后缀的元素:
alist = ['bar','barbar','foo','foos','barbares','foofoos','bares']
I need to output: 我需要输出:
alist = ['bar','barbar','foo','foofoos']
I've tried the following but it doesn't work because when i sort out the alist, it gets ['bar', 'barbar', 'barbares', 'bares', 'foo', 'foofoos', 'foos']
not ['bar', 'bares', 'barbar', 'barbares', 'foo', 'foos', 'foofoos']
我已经尝试了以下方法,但是它不起作用,因为当我整理列表时,它会得到
['bar', 'barbar', 'barbares', 'bares', 'foo', 'foofoos', 'foos']
不是['bar', 'bares', 'barbar', 'barbares', 'foo', 'foos', 'foofoos']
alist = ['bar','barbar','foo','foos','barbares','foofoos','bares']
prev = ""
no_s_list = []
for i in sorted(alist):
if i[-2:] == "es" and i[:-2] == prev:
continue
elif i[-1:] == "s" and i[:-1] == prev:
contine
else:
prev = i
no_s_list.append(i)
The above outputs: 以上输出:
>>> sorted(alist)
['bar', 'barbar', 'barbares', 'bares', 'foo', 'foofoos', 'foos']
def rm_suffix(s,suffixes):
for suf in suffixes:
if s.endswith(suf):
return s[:-len(suf)]
return s
alist = ['bar','barbar','foo','foos','barbares','foofoos','bares']
salist = set(alist)
suffixes = ('es','s')
blist = [x for x in alist
if (not x.endswith(suffixes)) or (rm_suffix(x,suffixes) not in salist)]
print blist # ['bar', 'barbar', 'foo', 'foofoos']
You can also use regex
here: 您还可以在这里使用
regex
:
re.split()
will return something like: re.split()
将返回如下内容:
barbar
--> ['barbar']
barbar
> ['barbar']
foos
--> ['foo', 's', '']
foos
> ['foo', 's', '']
barbares
--> ['barbar', 'es', '']
barbares
-> ['barbar', 'es', '']
foofoos
--> ['foofoo', 's', '']
foofoos
> ['foofoo', 's', '']
So, if the length of returned list is greater than 1 and first element in this returned list is found in alist
then you can remove it. 所以,在这一点,如果返回列表的长度大于1和第一个元素返回的列表中找到
alist
,那么你可以将其删除。
code
: code
:
In [106]: alist = ['bar','barbar','foo','foos','barbares','foofoos','bares']
In [107]: s=set(alist)
In [108]: for x in s.copy():
sol=re.split(r'(es|s)$',x)
if len(sol)>1 and sol[0] in s:
s.remove(x)
.....:
In [109]: s
Out[109]: set(['bar', 'foofoos', 'barbar', 'foo'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.