PDF Reader黄瓜红宝石

Question

I've been asked to write some tests to confirm text is contained within a PDF file. 我被要求编写一些测试来确认文本是否包含在PDF文件中。 I've come across the PDF reader gem which is all good at rendering text from the file except the output isn't too good. 我遇到过PDF阅读器gem，除了输出效果不是很好之外，它都很好地呈现文件中的文本。 I have a piece of text for example, that should read Date of first registration of the product but PDF reader sees this as Date offirstregistrationoftheproduct . 例如，我有一段文字，应Date of first registration of the product但PDF阅读器将其视为Date of first registration of the product Date offirstregistrationoftheproduct 。 Thus when I run my assertion, it fails due to the spacing of the text. 因此，当我运行我的断言时，由于文本的间隔，它失败了。

My code: 我的代码：

expected_text = 'Date of first registration of the product'

file = File.open(my_pdf, "rb")
  PDF::Reader.open(file) do |reader|
    reader.pages.each do |page|
       expect(page).to have_text expected_text
    end

The result is an RSpec expectation not met error. 结果是RSpec期望未满足错误。

Is there a way I can get this text properly formatted so that my assertion can read it? 有没有办法使我的文本格式正确，以便我的断言可以读取它？

Answer 1

The page object of Reader is not text. Reader的页面对象不是文本。 If you want to get text from a pdf, you may use page.text . 如果要从pdf中获取文本，则可以使用page.text 。 Using a regex may solve your problem. 使用正则表达式可以解决您的问题。

Try something like below. 尝试以下类似的方法。

expected_text = 'Date of first registration of the product'

file = File.open(my_pdf, "rb")
  PDF::Reader.open(file) do |reader|
    reader.pages.each do |page|
       expect(page.text.match(/#{expected_text}/)).to be true
    end

PDF Reader黄瓜红宝石

问题描述

1 个解决方案

解决方案1
0 2017-05-22 20:14:38

PDF Reader黄瓜红宝石

问题描述

1 个解决方案

解决方案1 0 2017-05-22 20:14:38

解决方案1
0 2017-05-22 20:14:38