简体   繁体   中英

How to extract particular String pattern from a given text using Regular Expressions

I have following text.

emailString = "Jhon, N, Edward, <edward@sri.lk>, " +
            "Mickal, Lantz, <mickal@sri.lk>, " +
            "Thomas, F, Kevin, <kevin@sri.lk>, " +
            "Marina, Anderson, <marina@sri.lk>, " +
            "Henry, Ford, <ford@sri.lk>, " +
            "Davin, Cammeron, <Cammeron@sri.lk>";

From the above text i want to list each information one by one as follows.

Jhon, N, Edward, <edward@sri.lk>
Mickal, Lantz, <mickal@sri.lk>
Thomas, F, Kevin, <kevin@sri.lk>
Marina, Anderson, <marina@sri.lk>
Henry, Ford, <ford@sri.lk>
Davin, Cammeron, <Cammeron@sri.lk>

I tried to do this using Java Regular expressions, but could not succeed.

How can i solve this using regular expressions in Java?

Following is the sample class i used.

 public class MainFrame
    {

        private static final String emailString = "Jhon, N, Edward, <edward@sri.lk>, " +
                "Mickal, Lantz, <mickal@sri.lk>, " +
                "Thomas, F, Kevin, <kevin@sri.lk>, " +
                "Marina, Anderson, <marina@sri.lk>, " +
                "Henry, Ford, <ford@sri.lk>, " +
                "Davin, Cammeron, <Cammeron@sri.lk>";

        public MainFrame()
        {

        }
        /**
         * @param args
         */
        public static void main(String[] args) 
        {
        String regularExpression = "(([.])*([A-Za-z0-9])*([.*])*)*(<[a-z0-9-]+(\\.[a-z0-9-]+)*@[A-Za-z0-9]+(\\.[A-Za-z0-9]+)*(\\.[a-z]{2,})>)([.])*([A-Za-z0-9])*([.*])*";
        Pattern pattern = Pattern.compile(regularExpression);

        Matcher matcher = pattern.matcher(emailString);

        String[] emails = emailString.split(regularExpression);

        for(String email : emails)
        {
            System.out.println("Email Address : " + email);
        }

        }

    }

How about just:

emailString.split(">,")

This will yeld:

Jhon, N, Edward, <edward@sri.lk
 Mickal, Lantz, <mickal@sri.lk
 Thomas, F, Kevin, <kevin@sri.lk
 Marina, Anderson, <marina@sri.lk
 Henry, Ford, <ford@sri.lk
 Davin, Cammeron, <Cammeron@sri.lk>

The regex is extra simple but the result needs some further processing:

  • every entry but last hast > stripped from the end
  • there are spaces at the beginning

All of this can be fix with:

String[] split = emailString.split(">,");
for (int i = 0; i < split.length; i++) {
    String string = split[i].trim();
    if(!string.endsWith(">"))
        string = string + '>';
    split[i] = string;
}

This will parse string like 'Jhon, N, Edward, <edward@sri.lk>' as one

(.*?>),

This will parse name and email separately. for example: 'Jhon, N, Edward' and 'edward@sri.lk'

(.*?),\s*<([^>]*)>,

Split by:

(?<=>),\s+

正则表达式可视化

Example:

String[] parts = emailString.split("(?<=>),\\s+");
System.out.println(parts[2]); // prints "Thomas, F, Kevin, <kevin@sri.lk>"

Visualization by Debuggex

public class MainFrame {

    private static final String emailString = "Jhon, N, Edward, <edward@sri.lk>, " + "Mickal, Lantz, <mickal@sri.lk>, " + "Thomas, F, Kevin, <kevin@sri.lk>, " + "Marina, Anderson, <marina@sri.lk>, " + "Henry, Ford, <ford@sri.lk>, " + "Davin, Cammeron, <Cammeron@sri.lk>";

    /**
     * @param args
     */
    public static void main(String[] args) {
        String EMAIL_PATTERN = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*@" + "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";

        String regularExpression = "(.*?<)" + EMAIL_PATTERN + "(>,)";
        Pattern pattern = Pattern.compile(regularExpression);

        Matcher matcher = pattern.matcher(emailString);

        while (matcher.find()) {
            System.out.println(matcher.group());
        }
    }

}

it will work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM