简体   繁体   中英

How does keyword extraction works?

I tested the keyword extraction from the Natural Language Understanding service from IBM with the following text:

Desarrollo PDA. Ajustes PDA. Nuevo modulo PDA. Ajustes modulo PDA. No sincroniza PDA. Error modulo PDA.

And i got the following response:

  • modulo pda with 98.31% relevance
  • ajustes modulo pda with 64.44% relevance
  • nuevo modulo pda with 64.34 relevance

Now my question is why is "modulo pda" keyword relevance 98.31% and not just "PDA" with a higher relevance?. I've been searching everywhere about how does IBM works with no avail.

The actual algorithm used to extract and score keywords would be a corporate proprietary recipe, I won't expect them to make it public. But you can find lot of research papers on that topic but usually the final commercial products would contain mix of different techniques to get the best results.

You can compare the different NLU services from different provides, like IBM, Google, Amazon and compare the results.

Specifically for your query, you are trying to extract keywords or topics from a single document. PDA occurs in every sentence in your document. If we apply a simple technique like TF-IDF where each sentence is a document, the the TF-IDF=0 for the word PDA since it occurs in every sentence and becomes irrelevant since its not adding an information to overall topic or document importance.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM