[英]Attempting to utilize text captured from a PDF
我有一個名為myArray
的數組,其中包含用空格分隔並從第一頁到最后一頁的PDF修剪的單詞。 我寫了一個簡單的print
數組方法,該方法迭代並逐個打印每個元素,看起來很棒!
在我得到它之后,它立即通過另一個for loop
來確定數組的長度,並檢查if (myArray[i].equals("(19)")) {//print something}
將數組打印到控制台時很明顯值(19)
存在於數組中。
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Scanner;
import org.apache.pdfbox.cos.COSDocument;
import org.apache.pdfbox.io.RandomAccessRead;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
public class Main {
static File file;
static PDFTextStripper textStripper;
static PDDocument pdDoc;
static COSDocument cosDoc;
static String parsedText;
static int sum = 0;
static String[] myArray;
static String[] events = {"400", "800", "1500",
"3000", "5000", "10000"};
public static void main(String[] args) {
//Read the PDF file into instance variable file
readFile();
try {
parsePDF(file);
} catch (IOException e) {
e.printStackTrace();
}
myArray = parsedText.split(" ");
removeWhiteSpace(myArray);
printArray(myArray);
//System.out.println();
String currentEvent = "";
for (int i = 0; i < myArray.length; i++) {
if (contains(myArray[i])) {
currentEvent = myArray[i];
}
if (!currentEvent.equals("")) {
if (myArray[i].charAt(0) == '(' && (myArray[i].charAt(myArray[i].length() - 1) == ')')) {
String formatedRunners = "";
//It is possible to see some numbers such as (19)) or (19)
if (containsCharacter(myArray[i], ')') == 2) {
formatedRunners = myArray[i].substring(1, myArray[i].length() - 2);
} else {
formatedRunners = myArray[i].substring(1, myArray[i].length() - 1);
}
int numberOfRunners = Integer.parseInt(formatedRunners);
int distance = Integer.parseInt(currentEvent);
sum += numberOfRunners * distance;
//reset currentEvent
currentEvent = "";
}
}
}
//Print total distance in meters
System.out.println(sum + " meters");
//Convert meters to miles using the following equation: meters / 1609.344
System.out.println( Math.round((sum / 1609.344)) + " miles");
}
public static void readFile() {
Scanner c = new Scanner(System.in);
System.out.println("Enter a file path: ");
String filePath = c.nextLine();
file = new File(filePath);
}
public static void parsePDF(File file) throws IOException {
textStripper = new PDFTextStripper();
pdDoc = PDDocument.load(file);
//Parse PDF
textStripper.setStartPage(1);
//textStripper.setEndPage();
//Parsed String
parsedText = textStripper.getText(pdDoc);
}
public static boolean contains(String s) {
for (int i = 0; i < events.length; i++) {
if (s.equals(events[i])) {
return true;
}
}
return false;
}
public static void printArray(String[] a) {
for (int i = 0; i < a.length; i++) {
System.out.println(a[i]);
}
}
public static void removeWhiteSpace(String[] a) {
for (int i = 0; i < myArray.length; i++) {
if (myArray[i].equals("")) {
//Use some filler to avoid crashes when checking characters
myArray[i] = "NULL";
}
//Trim off all extra whitespace
myArray[i] = myArray[i].trim();
}
}
public static int containsCharacter(String str, char c) {
int count = 0;
for (int i = 0; i < str.length(); i++) {
if (str.charAt(i) == c) {
count++;
}
}
return count;
}
}
這是我想要的:
myArray
(在main方法中)並檢測事件(確定) (Any number)
如(19)
(NOK) 似乎它正在正確讀取每個事件,但僅拾取(19))而不是(19)。
您的代碼中存在幾個問題(無異常處理,所有靜態問題,小錯誤等),但我將重點關注主要問題。 (我刪除了未更改的代碼)
public class Main {
static File file;
static PDFTextStripper textStripper;
static PDDocument pdDoc;
static COSDocument cosDoc;
static String parsedText;
static int sum = 0;
static String[] myArray = {"Seeded", "3000", "random", 25, "(44)", "1500", "random", "(13)"};
static String[] events = {"400", "800", "1500", "3000", "5000", "10000", "200.000"};
public static void main(String[] args) {
//Read the PDF file into instance variable file
readFile();
try {
parsePDF(file);
} catch (IOException e) {
e.printStackTrace();
}
myArray = parsedText.split(" ");
removeWhiteSpace(myArray);
String currentEvent = "";
for (int i = 0; i < myArray.length; i++) {
if (contains(myArray[i])) {
currentEvent = myArray[i];
}
else if (!currentEvent.isEmpty()) {
Integer value = extractNumber(myArray[i]);
if (!myArray[i].isEmpty() && value!=null) {
int distance = Integer.parseInt(currentEvent);
sum += value.intValue() * distance;
//reset currentEvent
currentEvent = "";
}
}
}
//Print total distance in meters
System.out.println(sum + " meters");
//Convert meters to miles using the following equation: meters / 1609.344
System.out.println( Math.round((sum / 1609.344)) + " miles");
}
public static Integer extractNumber(String toCheck) {
Pattern r = Pattern.compile("^.*?\\([^\\d]*(\\d+)[^\\d]*\\).*$");
Matcher m = r.matcher(toCheck);
if(m.find()) {
return Integer.valueOf(m.group(1));
}
return null;
}
public static void removeWhiteSpace(String[] a) {
for (int i = 0; i < myArray.length; i++) {
//Trim off all extra whitespace
myArray[i] = myArray[i].trim();
}
}
其結果是151500 meters 94 miles
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.