蒂卡归还空弦

Question

I am using Apache Tika 1.14 and pdf box 2.0.5. 我正在使用Apache Tika 1.14和pdf box 2.0.5。 When I try to extract the content from a pdf document, it is returning empty string. 当我尝试从pdf文档中提取内容时，它返回空字符串。

import java.io.File;
import java.io.IOException;

import org.apache.tika.Tika;
import org.apache.tika.exception.TikaException;

public class Test {
    public static void main(String args[]) throws IOException, TikaException{
        String filePath = "sample.pdf";

        Tika tika = new Tika();
        String content = tika.parseToString(new File(filePath));

        System.out.println(content);
    }
}

Following are the maven dependencies I am using. 以下是我正在使用的maven依赖项。

<!-- https://mvnrepository.com/artifact/org.apache.tika/tika-core -->
    <dependency>
        <groupId>org.apache.tika</groupId>
        <artifactId>tika-core</artifactId>
        <version>1.14</version>
    </dependency>


    <!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>pdfbox</artifactId>
        <version>2.0.5</version>
    </dependency>

Answer 1

You need to add 'tika-parsers' library to your project. 您需要在项目中添加“tika-parsers”库。 Add following dependency and retry. 添加以下依赖项并重试。

<!-- https://mvnrepository.com/artifact/org.apache.tika/tika-parsers -->
<dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-parsers</artifactId>
    <version>1.14</version>
</dependency>

蒂卡归还空弦

问题描述

1 个解决方案

解决方案1
5 已采纳 2017-03-30 10:08:16

蒂卡归还空弦

问题描述

1 个解决方案

解决方案1 5 已采纳 2017-03-30 10:08:16

解决方案1
5 已采纳 2017-03-30 10:08:16