简体   繁体   中英

how to use Pattern matcher in java?

lets say the string is <title>xyz</title> I want to extract the xyz out of the string. I used:

Pattern titlePattern = Pattern.compile("&lttitle&gt\\s*(.+?)\\s*&lt/title&gt");
Matcher titleMatcher = titlePattern.matcher(line);
String title=titleMatcher.group(1));    

but I am getting an error for titlePattern.matcher(line);

You say your error occurs earlier (what is the actual error, runs without an error for me), but after solving that you will need to call find() on the matcher once to actually search for the pattern:

if(titleMatcher.find()){
  String title = titleMatcher.group(1);
}

Not that if you really match against a string with non-escaped HTML entities like

<title>xyz</title>

Then your regular expression will have to use these, not the escaped entities:

"<title>\\s*(.+?)\\s*</title>"

Also, you should be careful about how far you try to get with this, as you can't really parse HTML or XML with regular expressions . If you are working with XML, it's much easier to use an XML parser, eg JDOM .

Not technically an answer but you shouldn't be using regular expressions to parse HTML. You can try and you can get away with it for simple tasks but HTML can get ugly. There are a number of Java libraries that can parse HTML/XML just fine. If you're going to be working a lot with HTML/XML it would be worth your time to learn them.

As others have suggested, it's probably not a good idea to parse HTML/XML with regex. You can parse XML Documents with the standard java API, but I don't recommend it. As Fabian Steeg already answered, it's probably better to use JDOM or a similar open source library for parsing XML.

With javax.xml.parsers you can do the following:

String xml = "<title>abc</title>";

DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();

Document doc = docBuilder.parse(new InputSource(new StringReader(xml)));
NodeList nodeList = doc.getElementsByTagName("title");
String title = nodeList.item(0).getTextContent();

This parses your XML string into a Document object which you can use for further lookups. The API is kinda horrible though.

Another way is to use XPath for the lookup:

XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xPath = xpathFactory.newXPath();
String titleByXpath = xPath.evaluate("/title/text()", new InputSource(new StringReader(xml)));
// or use the Document for lookup
String titleFromDomByXpath = xPath.evaluate("/title/text()", doc);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM