简体   繁体   English

如何提取HTML的 <td> 在Java中使用正则表达式标记数据?

[英]How to extract HTML's <td> tag data using regex in Java?

I trying to read Username and Password from an Email using Java It is returning mail content in html format and I just wanted to extract Username and Password which is present under <td> tag. 我试图使用Java从电子邮件中读取用户名和密码,它以html格式返回邮件内容,我只想提取<td>标签下的用户名和密码。 Below is my HTML code snippet - 以下是我的HTML代码段-

<table width="200">
   <tbody>
     <tr>
        <td colspan="2">Your Account Details:</td>
     </tr>
      <tr>
        <td>EmailId:</td>
        <td><a class="moz-txt-link-abbreviated" href="mailto:jainish.m.kapadia@trimantra.net">jainish.m.kapadia@trimantra.net</a></td>
      </tr>
      <tr>
         <td>Password:</td>
         <td>C3mRXh+|n#1J</td>
      </tr>
  </tbody>
</table>

How do I achieve this? 我该如何实现?

Please don't try to parse HTML with RegEx, for a detailed answer on why you shouldn't try this see this SO answer . 请不要尝试使用RegEx解析HTML,以获取有关为什么不应该尝试使用此方法的详细答案,请参阅此SO 答案

You can use jsoup for parsing your HTML Strings like this: 您可以使用jsoup来解析HTML字符串,如下所示:

String html = "<html><head><title>First parse</title></head>"
  + "<body><p>Parsed HTML into a doc.</p></body></html>";
Document doc = Jsoup.parse(html);

Element content = doc.getElementById("content");
Elements links = content.getElementsByTag("a");
for (Element link : links) {
  String linkHref = link.attr("href");
  String linkText = link.text();
}

jsoup also offers methods for hierarchical navigation like jsoup还提供了用于分层导航的方法,例如

siblingElements();
nextElementSibling();

and so on. 等等。

You can use below code snippet: 您可以使用以下代码段:

String str = "your html";
Pattern pattern = Pattern.compile("(<td>(.*?)<\\/td>)");
Matcher matcher = pattern.matcher(str);

This will give you back all the <td> tag. 这将带回所有<td>标记。 Now you can loop through the matcher and get your required string. 现在,您可以遍历matcher并获取所需的字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM