Given the following text, I'm trying to parse out the string "TestFile" after Address:
:
File: TestFile
Branch
OFFICE INFORMATION
Address: TestFile
City: L.A.
District.: 43
State: California
Zip Code: 90210
DISTRICT INFORMATION
Address: TestFile2
....
I understand that lookbehinds require zero-width so quantifiers are not allowed, meaning this won't work:
(?<=OFFICE INFORMATION\n\s*Address:).*(?=\n)
I could use this
(?<=OFFICE INFORMATION\n Address:).*
but it depends on consistent spacing, which isn't dynamic and thus not ideal.
How do I reliably parse out "TestFile" and not "TestFile2" as shown in my example above. Note that Address appears twice but I only need the first value.
Thank you
You don't really need to use a lookbehind here. Get your matched text using captured group:
(?:\bOFFICE INFORMATION\s+Address:\s*)(\S+)
captured group #1
will have value TestFile
JS Code:
var re = /(?:\bOFFICE INFORMATION\s+Address:\s*)(\S+)/;
var m;
var matches = [];
if ((m = re.exec(input)) !== null) {
if (m.index === re.lastIndex)
re.lastIndex++;
matches.push(m[1]);
}
console.log(matches);
Working with Array:
// A sample String
String questions = "File: TestFile Branch OFFICE INFORMATION Address: TestFile City: L.A. District.: 43 State: California Zip Code: 90210 DISTRICT INFORMATION Address: TestFile2";
// An array list to store split elements
ArrayList arr = new ArrayList();
// Split based on colon and spaces.
// Including spaces resolves problems for new lines etc
for(String x : questions.split(":|\\s"))
// Ignore blank elements, so we get a clean array
if(!x.trim().isEmpty())
arr.add(x);
This will give you an array which is:
[File, TestFile, Branch, OFFICE, INFORMATION, Address, TestFile, City, L.A., District., 43, State, California, Zip, Code, 90210, DISTRICT, INFORMATION, Address, TestFile2]
Now lets analyze... suppose you want information corresponding to Address
, or element Address
. This element is at position 5
in array. That means element 6
is what you want.
So you would do this:
String address = arr.get(6);
This will return you testFile
.
Similarly for City
, element 8
is what you want. The count starts from 0
. You can ofcourse modify my matching pattern or even create a loop and get yourself even better ways to do this task. This is just a hint.
Here is one such example loop:
// Every i+1 is the property tag, and every i+2 is the property name for
// Skip first 6 elements because they are of no real purpose to us
for(int i = 6; i<(arr.size()/2)+6; i+=2)
System.out.println(arr.get(i));
This gives following output:
TestFile
L.A.
43
California
Code
Ofcourse this loop is unrefined, refine it a little and you will get every element correctly. Even the last element. Or better yet, use ZipCode
instead of Zip Code
and dont use spaces in between and you will have a perfect loop with nothing much to be done in addition).
The advantage over using direct regex: You wont have to specify the regex for every single element. Iteration is always more handy to get things done automatically.
See this
//read input from file
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(new File("D:/tests/sample.txt"))));
StringBuilder string = new StringBuilder();
String line = "";
while((line = reader.readLine()) != null){
string.append(line);
string.append("\n");
}
//now string will contain the input as
/*File: TestFile
Branch
OFFICE INFORMATION
Address: TestFile
City: L.A.
District.: 43
State: California
Zip Code: 90210
DISTRICT INFORMATION
Address: TestFile2
....*/
Pattern regex = Pattern.compile("(OFFICE INFORMATION.*\\r?\\n.*Address:(?<officeAddress>.*)\\r?\\n)");
Matcher regexMatcher = regex.matcher(string.toString());
while (regexMatcher.find()) {
System.out.println(regexMatcher.group("officeAddress"));//prints TestFile
}
You can see the named group officeAddress
in the pattern which is needed to be extracted.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.