简体   繁体   English

XPath通过HTML标记搜索

[英]XPath search through HTML tags

The following HTML shows the 3rd search (search for "Practice Guidelines Professional") does not work as the text "Practice Guidelines" is placed between the <strong></strong> tag... Is it possible to achieve XPath search to bypass HTML tags between the texts? 以下HTML表示第3次搜索(搜索“ Practice Guidelines Professional”)无效,因为在<strong></strong>标记之间放置了文本“ Practice Guidelines” ...是否可以绕过XPath搜索文本之间的HTML标签?

<html>
<head>
<meta http-equiv="X-UA-Compatible" content="chrome=1">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Xpath Test</title>

<script type="text/javascript">
var search = function (button) {
    var searchTxt = button.value;
    var xpath = '//*[contains(translate(text(),\'ABCDEFGHIJKLMNOPQRSTUVWXYZ\', \'abcdefghijklmnopqrstuvwxyz\'), \'' + searchTxt.toLowerCase() + '\')]';
    var nodes = null;

    if (document.evaluate) { 
        nodes = document.evaluate(
          xpath,
          document,
          null,
          XPathResult.ORDERED_NODE_ITERATOR_TYPE,
          null
        );

        var node = nodes.iterateNext ();
        if (node) {
            while (node) {
                var nodeXpath = createXPathFromElement(node);
                alert("Found!!!\n xpath: " + nodeXpath);
                node = nodes.iterateNext ();
            }
        } else {
            alert("No results found.");
        }
    } else {  
        alert("Your browser does not support the evaluate method!");
    }
  }

  function createXPathFromElement(elm) {
    var allNodes = document.getElementsByTagName('*'); 
    for (segs = []; elm && elm.nodeType == 1; elm = elm.parentNode) 
    {  
        for (i = 1, sib = elm.previousSibling; sib; sib = sib.previousSibling) { 
            if (sib.localName == elm.localName)  i++; 
        };
        segs.unshift(elm.localName.toLowerCase() + '[' + i + ']'); 
    }; 
    return segs.length ? '//' + segs.join('//') : null; 
  }
</script>
</head>

<body>
<table>
    <tr>
        <td>
            <p>&#160;</p><h3><strong><span class="color-1">PRINCIPLES OF PATIENT CARE</span></strong></h3><p class="noindent"><strong><span class="color-1">Evidence-Based Medicine</span></strong> Evidence-based medicine refers to the concept that clinical decisions are formally supported by data, preferably data that are derived from prospectively designed, randomized, controlled clinical trials. This is in sharp contrast to anecdotal experience, which may often be biased. Unless they are attuned to the importance of using larger, more objective studies for making decisions, even the most experienced physicians can be influenced by recent encounters with selected patients. Evidence-based medicine has become an increasingly important part of the routine practice of medicine and has led to the publication of a number of practice guidelines.</p>
            <p>&#160;</p><p class="noindent"><strong><span class="color-1">Practice Guidelines</span></strong> Professional organizations and government agencies are developing formal clinical-practice guidelines to aid physicians and other caregivers in making diagnostic and therapeutic decisions that are evidence-based, cost-effective, and most appropriate to a particular patient and clinical situation. As the evidence base of medicine increases, guidelines can provide a useful framework for managing patients with particular diagnoses or symptoms. They can protect patients—particularly those with inadequate health care benefits—from receiving substandard care. Guidelines can also protect conscientious caregivers from inappropriate charges of malpractice and society from the excessive costs associated with the overuse of medical resources. There are, however, caveats associated with clinical practice guidelines since they tend to oversimplify the complexities of medicine. Furthermore, groups with differing perspectives may develop divergent recommendations regarding issues as basic as the need for periodic sigmoidoscopy in middle-aged persons. Finally, guidelines do not—and cannot be expected to—account for the uniqueness of each individual and his or her illness. The physician’s challenge is to integrate into clinical practice the useful recommendations offered by experts without accepting them blindly or being inappropriately constrained by them.</p>               
        </td>
    </tr>
    <tr>
        <td>
            Search for <input type="button" onclick="search(this)" value="Practice Guidelines" /> Success.
        </td>
    </tr>
    <tr>
        <td>
            Search for <input type="button" onclick="search(this)" value="Professional" /> Success.
        </td>
    </tr>
    <tr>
        <td>
            Search for <input type="button" onclick="search(this)" value="Practice Guidelines Professional" /> No result since text 'Practice Guidelines' is inside &lt;strong&gt;&lt;strong/&gt;, and  'Professional' is out side.     
        </td>
    </tr>
</table>
</body>
</html>

Yes. 是。 You can use normalize-space() for this. 您可以为此使用normalize-space() (line-wrapped for the sake of readability) (为了便于阅读而用行包装)

var xpath = "
  //*[
    contains(
      translate(
        normalize-space(.), 
        'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 
        'abcdefghijklmnopqrstuvwxyz'
      ), 
      normalize-space('" + searchTxt.toLowerCase() + "')
    )
  ]
";

Note that this will find the entire hierarchy of nodes that contain the respective text, up to and including the <html> element. 请注意,这将查找包含相应文本的节点的整个层次结构 ,直至并包括<html>元素。 You would be interested in the "most deeply nested" node, I guess. 我想您会对“最深层嵌套”的节点感兴趣。

Also note that you might want to remove single quotes from searchTxt as they would break the XPath expression. 还要注意,您可能希望从searchTxt除去单引号,因为它们会破坏XPath表达式。

There is probably also a recursive method that looks at the textContent/innerHTML of a root element to see if a match is found. 可能还有一个递归方法,它查看根元素的textContent / innerHTML以查看是否找到匹配项。 If so, go down firstChild nodes that match and so on, following children until the match fails, at which point the element containing the text has been found. 如果是这样,则向下走匹配的firstChild节点,依此类推,接着跟随孩子,直到匹配失败,此时找到包含文本的元素。

Should also work for multiple matches with the text. 还应该与文本进行多次匹配。

XPath might be faster, but perhaps not as wildely supported (though it seems to be on all modern browsers). XPath可能更快,但可能不被广泛支持(尽管似乎所有现代浏览器都支持)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM