简体   繁体   中英

Java regex for < and “!” in first line

I use this code to get html source code and the information I wanted. I was just testing if it will return me < and "!" for the first line. However, this doesn't work!

    import java.io.*;
    import java.net.URL;
    import java.util.regex.*;

    public class url
    {
        public static BufferedReader read(String url) throws Exception {
            return new BufferedReader(
                new InputStreamReader(
                    new URL(url).openStream()));
        }

        public static void main (String[] args) throws Exception{
            BufferedReader reader = read(args[0]);
            String line = reader.readLine();

            while(line != null) {
                System.out.println(line);
                line = reader.readLine(); 
                regex("//<//!",line);
                }
            }   

        public static void regex(String regex, String check){
                Pattern checkregex =Pattern.compile(regex);
                Matcher regexMatcher = checkregex.matcher(check);
                if(regexMatcher.find()==false)
                    return;

                while(regexMatcher.find()){
                    if(regexMatcher.group().length() !=0) {
                        System.out.println(regexMatcher.group().trim());
                    }
                }                   
        }    
    }

That's because you've confused backslashes \\ with forward-slashes / . The former are what's used for escaping special characters. So, change this:

                regex("//<//!",line);

to this:

                regex("\\<\\!",line);

That said, < and ! don't actually have any special meaning in this context, so you can just write:

                regex("<!",line);

if you prefer.

Also, note that the above regex matches the two-character substring <! . Something about your question makes me think that you might actually be wanting to match the one-character substrings < and ! separately? If so, you can either use the ...|... syntax for specifying multiple alternative patterns:

                regex("<|!",line);   // matches whatever matches < or matches !

or the [...] syntax for specifying a class of characters:

                regex("[<!]",line);  // matches a character that is either < or !

(in this circumstance, these two syntaxes are equivalent).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM