简体   繁体   中英

Reading PDF content from LOGSTASH

Can LOGSTASH read PDF file from a location and pull out content inside it and then send this content to destination (KAFKA)?

As I know LOGSTASH can read .TXT or .LOG or .CSV file but I am not sure if it is capable to read content from PDF.

Any suggestion on this line will be helpful.

If not, does kafka has this capability? Is it possible to read PDF content from APACHE KAFKA?

Logstash does not have a PDF input filter. You best bet is to find a program that can give you the text inside of a PDF file. There is this quesiton that might help: How to extract text from a PDF?

You could then setup something that generates text versions of the PDFs and then index those into elasticsearch using logstash.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM