[英]How to read multi-line content between two words from a PDF file using java?
I have a requirement where I have to get data from a PDF file which is coming after word "IN:" and before word "OUT:" and there are many such occurrences across the file.我有一个要求,我必须从一个 PDF 文件中获取数据,该文件位于单词“IN:”之后和单词“OUT:”之前,并且文件中出现了很多这样的情况。
The problem statement is that it can be in multiple lines as well, and it's format is not defined.问题是它也可以是多行的,而且它的格式没有定义。
I even tried it by putting some conditions like starting or ending with specific characters, but in that way I would have to write too many conditions and also such format does exist after the "OUT:" word which was getting fetched.我什至通过设置一些条件来尝试它,例如以特定字符开头或结尾,但这样我将不得不写太多条件,并且在获取“OUT:”字之后确实存在这种格式。
Kindly let me know how can I solve the problem.请让我知道我该如何解决这个问题。
Below is sample data formats:以下是示例数据格式:
Format 1:格式一:
IN: {
"abc": "valueabc",
"def": "valuedef",
"ghi":
[
{"jkl": valuejkl, "mno": valuemno, "pqr":
"valuepqr"},
{"jkl": valuejkl, "mno": valuemno, "stu": "valuestu", "pqr":
"valuepqr"},
{"jkl": valuejkl, "mno": valuemno, "stu": "valuestu", "pqr":
"valuepqr"}
],
"id": "1"
}
OUT: {"abc": "valueabc", "id": "1", "def": {}}
Format 2 :格式 2 :
IN: {"abc": "valueabc", "def": "valuedef", "id": "1"}
OUT: {"abc": "valueabc", "id": "1", "ghi": "valueghi"}
Format 3 :格式 3 :
IN: {"abc": "valueabc", "def": "valuedef", "jkl":
["valuejkl"], "id": "1"}
OUT: {"abc": "valueabc", "id": "1", "ghi": {}}
Below is the core logic of the solution code I have tried, in if statement there is separate data which needs to be fetched as well, afterwards it's the logic for fetching the data after "IN:" and before "OUT:"下面是我试过的解决方案代码的核心逻辑,在if语句中也有单独的数据需要获取,之后是“IN:”之后和“OUT:”之前获取数据的逻辑
for(String line:lines)
{
String pattern = "^[0-9]+[\\.][0-9]+[\\.][0-9]+[\\.].*";
boolean matches = Pattern.matches(pattern, line);
if(matches)
{
String subString1 = line.split("\\.")[3].trim();
String subString2 = line.split("\\.")[4].trim();
String finalString = subString1+"."+subString2+",";
System.out.println();
System.out.print(finalString);
}
else if(line.startsWith("IN:"))
{
String finalString = line.substring(3).trim();
System.out.print(finalString);
}
else if(!(line.startsWith("IN:")||line.startsWith("OUT:"))&&((line.trim().length()>1)&&(line.endsWith("}"))))
{
String finalString = line.trim();
System.out.print(finalString);
}
else if(!(line.startsWith("IN:")||line.startsWith("OUT:"))&&((line.trim().length()>1)&&(line.startsWith("\""))))
{
String finalString = line.trim();
System.out.print(finalString);
}
else
{
continue;
}
}
How about this?这个怎么样? If you want a value between
IN:
and OUT:
, Could you try this code?如果你想要一个
IN:
和OUT:
之间的值,你能试试这个代码吗?
StringBuilder sb = new StringBuilder();
boolean targetFound = false;
for (String line : lines) {
if (line.startsWith("IN:")) {
line = line.replace("IN:", "");
targetFound = false;
} else if (line.startsWith("OUT:")) {
targetFound = true;
}
if (targetFound && !line.equals("OUT:")) {
// Print
System.out.println(sb.toString());
sb.setLength(0);
} else {
sb.append(line.trim());
}
}
INPUT TEXT:输入文本:
IN: {
"abc": "valueabc",
"def": "valuedef",
"ghi":
[
"valuepqr"},
{"jkl": valuejkl, "mno": valuemno, "stu": "valuestu", "pqr":
"valuepqr"}
],
"id": "1"
}
OUT: {"abc": "valueabc", "~"}
RESULT:结果:
{"abc": "valueabc","def": "valuedef","ghi":["valuepqr"},{"jkl": valuejkl, "mno": valuemno, "stu": "valuestu", "pqr":"valuepqr"}],"id": "1"}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.