简体   繁体   English

在多个分隔符之间提取文本

[英]Extracting text between multiple Delimiters

I need to extract a specific string from a text file that has lines with multiple Delimiters that may be similar or different. 我需要从文本文件中提取一个特定的字符串,该文件文件包含多个可能相似或不同的Delimiters。 For example, lets say I have a text file contains the below lines. 例如,假设我有一个包含以下行的文本文件。 Let's consider each text between a delimiter as a segment. 让我们将分隔符之间的每个文本视为一个段。

ABC#12#3#LINE1####1234678985$
DEF#XY#Z:1234:1234561230$
ABC#12#3#LINE TWO####1234678985$
DEF#XY#Z:1234:4564561230$
ABC#12#3#3RD LINE####1234678985$
DEF#XY#Z*1234:7894561230$

I need to write a code that extracts the text after ABC#12#3# in all the lines in the text file, based on two inputs. 我需要编写一个代码,根据两个输入,在文本文件的所有行中提取ABC#12#3#之后的文本。

1) The segment to find (eg, ABC ) 1)要查找的段(例如ABC

2) Position of the segment from which I need to extract the text. 2)我需要从中提取文本的段的位置。 (eg, 4 ) (例如, 4

So, an input of ABC and 4th segment will give a result - LINE1 and an input of DEF and 5th segment will give a result - 1234678985 . 因此, ABC和第4段的输入将给出结果 - LINE1DEF和第5段的输入将给出结果 - 1234678985 This is what I've got so far regarding the 1st input. 这是我到目前为止第一次输入的内容。

scanner = new Scanner(file);
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (line.contains(find)){   // find is the 1st input - (e.g., ABC)
System.out.println("Line to be replaced - "+ line);
int ind1 = line.indexOf(findlastchar+"*")+1;
int ind2 = line.indexOf("*");
System.out.println("Ind1 is "+ ind1+ " and Ind2 is " + ind2);
System.out.println("findlastchar is "+findlastchar+"#");
remove = line.substring(line.indexOf(findlastchar)+1, line.indexOf("#"));
System.out.println("String to be replaced " + remove);
content = content.replaceAll(remove, replace);
    }
}

I've got 2 problems with my code. 我的代码有2个问题。 I don't know how I can use substring to separate text between SAME delimiters and I'm not sure how to write the code such that it is able to identify all the following special characters as delimiters - {#, $, :} and thereby consider any text between ANY of these delimiters as a segment. 我不知道如何使用substring在SAME分隔符之间分隔文本,我不知道如何编写代码,以便能够将以下所有特殊字符识别为分隔符 - {#, $, :}和从而将任何这些分隔符之间的任何文本视为一个段。

Answer to this question uses regex which I want to avoid. 这个问题的回答使用了我想要避免的正则表达式。

Simply split the line and use index: 简单地分割线和使用索引:

public String GetItemFromLine(String s, String delimiter, String prefix, int index) {
   String[] items = s.split(delimiter);
   return items[0] == prefix ? items[index] : null;
}

PS I have no experience with Java so please treat this example as pseudo-code. PS我没有使用Java的经验,所以请将此示例视为伪代码。

要么使用StringTokenizer ,您可以将分隔符作为String传递,然后循环遍历它(请参阅此示例 )或甚至更好地使用带有正则表达式的String.split:

String[] words = line.split("#|$|:");

It its probably not the most efficient way, but you can do it with regex, for example: 它可能不是最有效的方式,但你可以用正则表达式来做,例如:

(ABC[#:*$]+(?:\w+[#:*$]+){2}|DEF[#:*$]+(?:\w+[#:*$]+){3})([^#:*$]+)(.+)

DEMO DEMO

Where with {2} and {3} (nambers of repetitions of given pattern) you decide which part of string should be repleced. {2}{3} (给定模式的重复项)的情况下,您决定应该补充哪一部分字符串。 In this case you change only fragment between delimiters. 在这种情况下,您只更改分隔符之间的片段。 Example in Java: Java中的示例:

public class Test{
    public static void main(String[] args) {
        String[] lines = {"ABC#12#3#LINE1####1234678985$",
                "DEF#XY#Z:1234:1234561230$",
                "ABC#12#3#LINE TWO####1234678985$",
                "DEF#XY#Z:1234:4564561230$",
                "ABC#12#3#3RD LINE####1234678985$",
                "DEF#XY#Z*1234:7894561230$"};
        for (String line : lines) {
            String result = line.replaceAll("(ABC[#:*$]+(?:\\w+[#:*$]+){2}|DEF[#:*$]+(?:\\w+[#:*$]+){3})([^#:*$]+)(.+)","$1" + " replacement " + "$3");
            System.out.println(result);
        }
    }
}

with output: 输出:

ABC#12#3# replacement ####1234678985$
DEF#XY#Z:1234: replacement $
ABC#12#3# replacement ####1234678985$
DEF#XY#Z:1234: replacement $
ABC#12#3# replacement ####1234678985$
DEF#XY#Z*1234: replacement $

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM