html 標簽中的文本，提供帶有屬性的標簽名稱

Question

我有一個看起來像這樣的字符串-

  <h3 class="media__title"> 
  <a class="media__link" href="/news/world-europe41644527" rev="video|headline">
  The equestrian champion with no legs                                                         
  </a> </h3>

我嘗試使用這種模式讀取並獲取 h3 標簽中的文本

 String regex = <h3>(.+?)</h3>

我正在使用的代碼

 private ArrayList<String> getValues(String resource) {
    final ArrayList<String> values= new ArrayList<>();
    final Matcher matcher = regex.matcher(str);
    while (matcher.find()) {
        values.add(matcher.group(1));
    }
    return values;
}

如果我從 h3 標簽中刪除class=media__title屬性，此代碼將起作用。 我嘗試將正則表達式更改為此

String regex = <h3 class=\"medial__title\">(.+?)</h3>

仍然沒有進展。 有人能告訴我這個正則表達式模式應該改變什么嗎？

Answer 1

嘗試這個：

String regex = <h3 (.*)>((.|\s)+?)<\/h3>

您的方法的主要問題是 . 字符與行終止符不匹配。

解釋：

<h3 (.*)> matches an opening h3 tag together with all attributes contained (you could also use different patterns if you are interested in the attributes themselfs)

((.|\s)+?) match everything inside the h3 tag (.|s) means everything ("everything but line terminators or whitesaces")

<\/h3> the closing h3 tag (escaped because / is a regex delimiter)

請記住，現在您要查找的組是第二組，而不是第一組

html 標簽中的文本，提供帶有屬性的標簽名稱

問題描述

1 個解決方案

解決方案1
1 已采納 2017-10-21 12:44:13

html 標簽中的文本，提供帶有屬性的標簽名稱

問題描述

1 個解決方案

解決方案1 1 已采納 2017-10-21 12:44:13

解決方案1
1 已采納 2017-10-21 12:44:13