简体繁体中英

Adding more custom entities into pretrained custom NER Spacy3

原文 2022-05-24 11:37:38 3 1 vector/ nlp/ named-entity-recognition/ transformer/ spacy-3

I've a huge amount of textual data and wanted to add around 50 different entities. Initially when I started working with it, I was getting memory error. As we know spacy can handle 1,00,000 tokens per GB and maximum up to 10,00,000. So I chunked my dataset into 5 sets and using annotator created multiple JSON file for the same. Now I started with one JSON and successfully completed creating the model and now I want to add more data into it so that I don't miss out any tags and there's a good variety of data is used while training in the model. Please guide me how to proceed next.

1 answers

I mentioned some points of confusion in a comment, but assuming that your issue is how to load a large training set into spaCy, the solution is pretty simple.

First, save your training data as multiple .spacy files in one directory. You do not have to make JSON files, that was standard in v2. For details on training data see the training data section of the docs . In your config you can specify this directory as the training data source and spaCy will use all the files there.

Next, to avoid keeping all the training data in memory, you can specify max_epochs = -1 (see the docs on streaming corpora ). Using this feature means you will have to specify your labels ahead of time as covered in the docs there. You will probably also want to shuffle your training data manually.

That's all you need to train with a lot of data.

The title of your question mentions adding entities to the pretrained model. It's usually better to train from scratch instead to avoid catastrophic forgetting, but you can see a guide to doing it here .

Adding custom methods to std::vector or typdef

Adding negative zero floats results in 1.0f only when using custom Vector class

Making a vector more efficient while adding elements

Adding More Parameters For Particular Virtual Method

Custom searchsortedfirst method

Sort a vector of custom objects

Using Custom Vectors in C

Android dropshadow on custom vector

Vector of Vector with custom class

Inititalze vector of custom class?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Adding custom methods to std::vector or typdef Adding negative zero floats results in 1.0f only when using custom Vector class Making a vector more efficient while adding elements Adding More Parameters For Particular Virtual Method Custom searchsortedfirst method Sort a vector of custom objects Using Custom Vectors in C Android dropshadow on custom vector Vector of Vector with custom class Inititalze vector of custom class?

Related Tags

Adding more custom entities into pretrained custom NER Spacy3

Question

1 answers

solution1 0 2022-05-25 04:45:41

solution1
0 2022-05-25 04:45:41