简体   繁体   中英

pattern matching using regular expressions replace by digits

my program is to take a big string from the user like aaaabaaaaaba

then the output should be replace aaa by 0 and aba by 1 in the given pattern of

string it should not be take a sequence one into the other every sequence is

individual and like aaaabaaabaaaaba here aaa-aba-aab-aaa-aba are individual and

should not overlap eachother while matching please help me to get this program

example: aaaabaaaaaba   input ended output is 0101
  import java.util.Scanner; import java.util.regex.Matcher; import java.util.regex.Pattern; public class Pattern1 { Scanner sc =new Scanner(System.in); public void m1() { String s; System.out.println("enter a string"); s=sc.nextLine(); assertTrue(s!=null); Pattern p = Pattern.compile(s); Matcher m =p.matcher(".(aaa"); Matcher m1 =p.matcher("aba"); while(m.find()) { s.replaceAll(s, "1"); } while(m1.find()) { s.replaceAll(s, "0"); } System.out.println(s); } private boolean assertTrue(boolean b) { return b; // TODO Auto-generated method stub } public static void main(String[] args) { Pattern1 p = new Pattern1(); p.m1(); } } 

With regex and find you can search for each successive match and then add a 0 or 1 depending on the characters to the output.

String test = "aaaabaaaaabaaaa";

Pattern compile = Pattern.compile("(?<triplet>(aaa)|(aba))");
Matcher matcher = compile.matcher(test);

StringBuilder out = new StringBuilder();

int start = 0;
while (matcher.find(start)) {
    String triplet = matcher.group("triplet");

    switch (triplet) {
        case "aaa":
            out.append("0");
            break;
        case "aba":
            out.append("1");
            break;
    }

    start = matcher.end();
}

System.out.println(out.toString());

If you have "aaaaaba" (one a too much in the first triplet) as input, it will ignore the last "a" and output "01". So any invalid characters between valid triplets will be ignored.

If you want to go through the string blocks of 3 you can use a for-loop and the substring() function like this:

String test = "aaaabaaaaabaaaa";

StringBuilder out = new StringBuilder();

for (int i = 0; i < test.length() - 2; i += 3) {
    String triplet = test.substring(i, i + 3);

    switch (triplet) {
        case "aaa":
            out.append("0");
            break;
        case "aba":
            out.append("1");
            break;
    }
}

System.out.println(out.toString());

In this case, if a triplet is invalid, it will just be ignored and neither a "0" nor a "1" will be added to the output. If you want to do something in this case, just add a default clause to the switch statement.

Here's what I understand from your question:

  • The user string will be some sequence of the tokens "aaa" and "aba"
  • There will be no other combinations of 'a' and 'b'. For example, you will not get "aaabaa" as an input string as "baa" is invalid..
  • For each consecutive 3 character string, replace "aaa" with 0 and "aba" with 1.

I'm guessing that this is a homework assignment designed to teach you about the dangers of catastrophic backtracking and how to carefully use quantifiers.

My suggestion would be to do this in two parts:

  1. Identify and replace each 3-letter segment with a single character.
  2. Replace those characters with the appropriate value. ('1' or '0')

For example, first construct a pattern like a([ab])a to capture the character ('a' or 'b') between two 'a's. Then, use the Matcher class' replaceAll method to replace each match with the captured character. So, for input aaaabaaaaaba' you get abab` as a result. Finally, replace all 'a' with '0' and all 'b' with '1'.

In Java:

// Create the matcher to identify triplets in the form "aaa" or "aba"
Matcher tripletMatcher = Pattern.compile("a([ab])a").matcher(inputString);

// Replace each triplet with the middle letter, then replace 'a' and 'b' properly.
String result = tripletMatcher.replaceAll("$1").replace('a', '0').replace('b', '1');

There's better ways of doing this, of course, but this should work. I've left the code intentionally dense and hard to read quickly. So, if this is a homework assignment, make sure you understand it fully and then rewrite it yourself.

Also, keep in mind that this will not work if the input string that isn't a sequence of "aaa" and "aba". Any other combination, such as "baa" or "abb", will cause errors. For example, ababaa , aababa , and aaabab will all result in unexpected and potentially incorrect results.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM