Last line of text cut off when extracting text from PDF using MuPDF

Question

I'm using MuPDF to extract some text from a pdf file. Most of the time it works fine, but occasionally the last line will either not extract or not print.

fz_text_sheet *sheet = fz_new_text_sheet(self.ctx);
fz_text_page *text = fz_new_text_page(self.ctx, &fz_empty_rect);
fz_device *dev = fz_new_text_device(self.ctx, sheet, text);

fz_page *page = fz_load_page(self.doc, pageNumber);
fz_run_page(self.doc, page, dev, &fz_identity, NULL);

fz_output *out = fz_new_output_file(ctx, stdout);
fz_print_text_page_html(ctx, out, text);

The first page of this PDF fails to print the last line of text on that page.

Am I doing something wrong or is this a bug?

Thanks!

Answer 1

You need to free the text device before you can use the fz_text_page safely. There may be some stuff buffered in the device that doesn't get flushed until it's freed.

Last line of text cut off when extracting text from PDF using MuPDF

Question

1 answers

solution1
0 2013-06-03 13:20:03

Last line of text cut off when extracting text from PDF using MuPDF

Question

1 answers

solution1 0 2013-06-03 13:20:03

solution1
0 2013-06-03 13:20:03