简体   繁体   中英

Entity extraction on large documents

I have a need to extract entities from word and pdf documents. Documents can be in the range of 10 to 20 pages. Are there scalable library/APIs available that we can plug into our processing pipeline? Any comparative study of different solutions will be helpful.

Take a look at the Watson Natural Language Understanding (you'll need to get an IBM ID and then login to see this content - don't worry , cost is $0). With Watson Natural Language Understanding you will want to look at the API Explorer to find the correct API syntax to use to get the results that you are looking for.

I also noticed that mention Word/PDF documents. You will need to convert those using the Watson Discovery service, and then you can pass the converted documents to Watson Natural Language Understanding , which takes in JSON, text or HTML inputs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM