简体   繁体   中英

Last line of text cut off when extracting text from PDF using MuPDF

I'm using MuPDF to extract some text from a pdf file. Most of the time it works fine, but occasionally the last line will either not extract or not print.

fz_text_sheet *sheet = fz_new_text_sheet(self.ctx);
fz_text_page *text = fz_new_text_page(self.ctx, &fz_empty_rect);
fz_device *dev = fz_new_text_device(self.ctx, sheet, text);

fz_page *page = fz_load_page(self.doc, pageNumber);
fz_run_page(self.doc, page, dev, &fz_identity, NULL);

fz_output *out = fz_new_output_file(ctx, stdout);
fz_print_text_page_html(ctx, out, text);

The first page of this PDF fails to print the last line of text on that page.

Am I doing something wrong or is this a bug?

Thanks!

You need to free the text device before you can use the fz_text_page safely. There may be some stuff buffered in the device that doesn't get flushed until it's freed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM