簡體   English   中英

嘗試利用從PDF捕獲的文本

[英]Attempting to utilize text captured from a PDF

我有一個名為myArray的數組,其中包含用空格分隔並從第一頁到最后一頁的PDF修剪的單詞。 我寫了一個簡單的print數組方法,該方法迭代並逐個打印每個元素,看起來很棒!

在我得到它之后,它立即通過另一個for loop來確定數組的長度,並檢查if (myArray[i].equals("(19)")) {//print something}將數組打印到控制台時很明顯值(19)存在於數組中。

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Scanner;

import org.apache.pdfbox.cos.COSDocument;
import org.apache.pdfbox.io.RandomAccessRead;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;

public class Main {

    static File file;
    static PDFTextStripper textStripper;
    static PDDocument pdDoc;
    static COSDocument cosDoc;
    static String parsedText;
    static int sum = 0;
    static String[] myArray;
    static String[] events = {"400", "800", "1500",
            "3000", "5000", "10000"};

    public static void main(String[] args) {

        //Read the PDF file into instance variable file
        readFile();

        try {
            parsePDF(file);
        } catch (IOException e) {
            e.printStackTrace();
        }

        myArray = parsedText.split(" ");
        removeWhiteSpace(myArray);
        printArray(myArray);
        //System.out.println();

        String currentEvent = "";
        for (int i = 0; i < myArray.length; i++) {

            if (contains(myArray[i])) {
                currentEvent = myArray[i];
            }

            if (!currentEvent.equals("")) {
                if (myArray[i].charAt(0) == '(' && (myArray[i].charAt(myArray[i].length() - 1) == ')')) {

                    String formatedRunners = "";

                    //It is possible to see some numbers such as (19)) or (19)
                    if (containsCharacter(myArray[i], ')') == 2) {
                        formatedRunners = myArray[i].substring(1, myArray[i].length() - 2);
                    } else {
                        formatedRunners = myArray[i].substring(1, myArray[i].length() - 1);
                    }

                    int numberOfRunners = Integer.parseInt(formatedRunners);
                    int distance = Integer.parseInt(currentEvent);


                    sum += numberOfRunners * distance;

                    //reset currentEvent
                    currentEvent = "";
                }
            }
        }
        //Print total distance in meters
        System.out.println(sum + " meters");

        //Convert meters to miles using the following equation: meters / 1609.344 
        System.out.println( Math.round((sum / 1609.344)) + " miles");
    }

    public static void readFile() {
        Scanner c = new Scanner(System.in);
        System.out.println("Enter a file path: ");
        String filePath = c.nextLine();
        file = new File(filePath);
    }

    public static void parsePDF(File file) throws IOException {

        textStripper = new PDFTextStripper();
        pdDoc = PDDocument.load(file);

        //Parse PDF
        textStripper.setStartPage(1);
        //textStripper.setEndPage();
        //Parsed String
        parsedText = textStripper.getText(pdDoc);

    }

    public static boolean contains(String s) {
        for (int i = 0; i < events.length; i++) {
            if (s.equals(events[i])) {
                return true;
            }
        }
        return false;
    }

    public static void printArray(String[] a) {
        for (int i = 0; i < a.length; i++) {
            System.out.println(a[i]);
        }

    }

    public static void removeWhiteSpace(String[] a) {
        for (int i = 0; i < myArray.length; i++) {

            if (myArray[i].equals("")) {

                //Use some filler to avoid crashes when checking characters
                myArray[i] = "NULL";
            }

            //Trim off all extra whitespace
            myArray[i] = myArray[i].trim();
        }
    }

    public static int containsCharacter(String str, char c) {

        int count = 0;
        for (int i = 0; i < str.length(); i++) {
            if (str.charAt(i) == c) {
                count++;
            }
        }
        return count;
    }
}

這是我想要的:

  1. 解析和修剪等(確定)
  2. 遍歷myArray (在main方法中)並檢測事件(確定)
  3. 如果發生事件,則下一個值必須為(Any number)(19) (NOK)
  4. 步驟3中的數字將用於計算另一個數字
  5. 重置當前事件以一次又一次地重復該過程。

似乎它正在正確讀取每個事件,但僅拾取(19))而不是(19)。

您的代碼中存在幾個問題(無異常處理,所有靜態問題,小錯誤等),但我將重點關注主要問題。 (我刪除了未更改的代碼)

public class Main {

static File file;
static PDFTextStripper textStripper;
static PDDocument pdDoc;
static COSDocument cosDoc;
static String parsedText;
static int sum = 0;
static String[] myArray = {"Seeded", "3000", "random", 25, "(44)", "1500", "random", "(13)"};
static String[] events = {"400", "800", "1500", "3000", "5000", "10000", "200.000"};

public static void main(String[] args) {

    //Read the PDF file into instance variable file
    readFile();

    try {
        parsePDF(file);
    } catch (IOException e) {
        e.printStackTrace();
    }

    myArray = parsedText.split(" ");
    removeWhiteSpace(myArray);

    String currentEvent = "";
    for (int i = 0; i < myArray.length; i++) {

        if (contains(myArray[i])) {
            currentEvent = myArray[i];
        }
        else if (!currentEvent.isEmpty()) {

            Integer value = extractNumber(myArray[i]);

            if (!myArray[i].isEmpty() && value!=null) {

                int distance = Integer.parseInt(currentEvent);
                sum += value.intValue() * distance;

                //reset currentEvent
                currentEvent = "";
            }
        }
    }
    //Print total distance in meters
    System.out.println(sum + " meters");

    //Convert meters to miles using the following equation: meters / 1609.344 
    System.out.println( Math.round((sum / 1609.344)) + " miles");
}

public static Integer extractNumber(String toCheck) {
    Pattern r = Pattern.compile("^.*?\\([^\\d]*(\\d+)[^\\d]*\\).*$");

    Matcher m = r.matcher(toCheck);
    if(m.find()) {
        return Integer.valueOf(m.group(1));
    }
    return null;
}

public static void removeWhiteSpace(String[] a) {
    for (int i = 0; i < myArray.length; i++) {
        //Trim off all extra whitespace
        myArray[i] = myArray[i].trim();
    }
}

其結果是151500 meters 94 miles

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM