[英]Python - Tag all named entities with spacy
I created a function to tag all named entities with Spacy: 我创建了一个函数来用Spacy标记所有命名实体:
def tag_ne(content):
doc = nlp(content)
text = doc.text
for ent in doc.ents:
text = re.sub(ent.text, ent.label_, text)
return text
When I apply this to a small Pandas series of unicode strings, it works. 当我将其应用于小型的Pandas系列unicode字符串时,它可以工作。 However, when I apply it to my whole dataset, I get an error (because of an error caused by a specific observation).
但是,当我将其应用于整个数据集时,会出现错误(由于特定观察导致的错误)。 I have no way of knowing what is causing the error and I cannot share my dataset, but the error is as follows:
我无法知道是什么导致了错误,并且无法共享我的数据集,但是错误如下:
---------------------------------------------------------------------------
error Traceback (most recent call last)
<ipython-input-56-274bc594a3e7> in <module>()
----> 1 emails.content.apply(tag_ne)
/vol1/home/ccostello/.conda/envs/chris_/lib/python2.7/site-packages/pandas/core/series.pyc in apply(self, func, convert_dtype, args, **kwds)
3190 else:
3191 values = self.astype(object).values
-> 3192 mapped = lib.map_infer(values, f, convert=convert_dtype)
3193
3194 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()
<ipython-input-46-6900d0e291db> in tag_ne(content)
3 text = doc.text
4 for ent in doc.ents:
----> 5 text = re.sub(ent.text, ent.label_, text)
6 return text
/vol1/home/ccostello/.conda/envs/chris_/lib64/python2.7/re.pyc in sub(pattern, repl, string, count, flags)
149 a callable, it's passed the match object and must return
150 a replacement string to be used."""
--> 151 return _compile(pattern, flags).sub(repl, string, count)
152
153 def subn(pattern, repl, string, count=0, flags=0):
/vol1/home/ccostello/.conda/envs/chris_/lib64/python2.7/re.pyc in _compile(*key)
240 p = sre_compile.compile(pattern, flags)
241 except error, v:
--> 242 raise error, v # invalid expression
243 if len(_cache) >= _MAXCACHE:
244 _cache.clear()
error: unbalanced parenthesis
What's an alternative way that I can tag all of my named entities that might get me around this error? 我可以标记所有可能导致此错误的所有命名实体的替代方法是什么? Otherwise, how can I resolve it?
否则,我该如何解决?
Of course you can know what row is causing the error. 当然,您可以知道导致错误的行。 Just add a try/except statement:
只需添加一条try / except语句:
def tag_ne(content):
doc = nlp(content)
text = doc.text
for ent in doc.ents:
try:
text = re.sub(ent.text, ent.label_, text)
except Exception as e:
print(ent.text, ent.label_, '\n', e)
return text
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.