简体   繁体   English

Huggingface NER 与自定义数据

[英]Huggingface NER with custom data

I have a csv data as below.我有一个 csv 数据,如下所示。

**token**      **label**
0.45"      length
1-12       size
2.6"       length
8-9-78     size
6mm        length

Whenever I get the text as below每当我收到如下文字时

6mm 8-9-78 silver head

I should be able to say length = 6mm and size = 8-9-78 .我应该可以说length = 6mmsize = 8-9-78 I'm new to NLP world, I'm trying to solve this using Huggingface NER.我是 NLP 世界的新手,我正在尝试使用 Huggingface NER 来解决这个问题。 I have gone through various articles.我浏览了各种文章。 I'm not getting how to train with my own data.我不知道如何使用我自己的数据进行训练。 Which model/tokeniser should I make use of?我应该使用哪种model/tokeniser Or should I build my own?还是我应该建立自己的? Any help would be appreciated.任何帮助,将不胜感激。

I would maybe look at spaCy's pattern matching + NER to start.我可能会看看 spaCy 的模式匹配 + NER 来开始。 The pattern matching rules spacy provides are really powerful, especially when combined with their statistical NER models. spacy 提供的模式匹配规则非常强大,尤其是与他们的统计 NER 模型结合使用时。 You can even use the patterns you develop to create your own custom NER model.您甚至可以使用您开发的模式来创建您自己的自定义 NER 模型。 This will give you a good idea of where you still have gaps or complexity that might require something else like Huggingface, etc.这将使您很好地了解您仍然存在差距或复杂性的地方,这些差距或复杂性可能需要其他东西,例如 Huggingface 等。

If you are willing to pay, you can also leverage prodigy which provides a nice UI with Human In the Loop interactions.如果您愿意付费,您还可以利用 Prodigy,它提供了一个带有 Human In the Loop 交互的漂亮 UI。

Adding REGEX entities to SpaCy's Matcher 将 REGEX 实体添加到 SpaCy 的 Matcher

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM