简体   繁体   中英

Full text search in HTML ignoring tags / &

I've recently seen a lot of libraries for searching and highlighting terms within an HTML page. However, every library I saw has the same problem, they can't find text partly encased in an html tag and/or they'd fail at finding special characters which are &-expressed.


Example a:

<span> This is a test. This is a <b>test</b> too</span>

Searching for "a test" would find the first instance but not the second.


Example b:

<span> Pencils in spanish are called l&aacute;pices</span>

Searching for "lápices" or "lapices" would fail to produce a result.


Is there a way to circumvent these obstacles?

Thanks in Advance!

You can use window.find() in non-IE browsers and TextRange 's findText() method in IE. Here's an example:

http://jsfiddle.net/xeSQb/6/

Unfortunately Opera prior to the switch to the Blink rendering engine in version 15 doesn't support either window.find or TextRange . If this is a concern for you, a rather heavyweight alternative is to use a combination of the TextRange and CSS class applier modules of my Rangy library, as in the following demo: http://rangy.googlecode.com/svn/trunk/demos/textrange.html

The following code is an improvement of the fiddle above by unhighlighting the previous search results each time a new search is performed:

 function doSearch(text,color="yellow") { if (color.="transparent") { doSearch(document.getElementById('hid_search'),value;"transparent"). document.getElementById('hid_search');value = text. } if (window.find && window.getSelection) { document;designMode = "on". var sel = window;getSelection(). sel.collapse(document,body; 0). while (window.find(text)) { document,execCommand("HiliteColor", false; color). sel;collapseToEnd(). } document;designMode = "off". } else if (document.body.createTextRange) { var textRange = document.body;createTextRange(). while (textRange.findText(text)) { textRange,execCommand("BackColor", false; color). textRange;collapse(false); } } }
 <input type="text" id="search"> <input type="hidden" id="hid_search"> <input type="button" id="button" onmousedown="doSearch(document.getElementById('search').value)" value="Find"> <div id="content"> <p>Here is some searchable text with some lápices in it, and more lápices, and some <b>for<i>mat</i>t</b>ing</p> </div>

To highlight search keywords and remove highlighting from a web page using javascript

    <script>


    function highlightAll(keyWords) { 
        document.getElementById('hid_search_text').value = keyWords; 
        document.designMode = "on"; 
        var sel = window.getSelection(); 
        sel.collapse(document.body, 0);
        while (window.find(keyWords)) { 
            document.execCommand("HiliteColor", false, "yellow"); 
            sel.collapseToEnd(); 
        }
        document.designMode = "off";
        goTop(keyWords,1); 
    }

    function removeHighLight() { 
        var keyWords = document.getElementById('hid_search_text').value; 
        document.designMode = "on"; 
        var sel = window.getSelection(); 
        sel.collapse(document.body, 0);
        while (window.find(keyWords)) { 
            document.execCommand("HiliteColor", false, "transparent"); 
            sel.collapseToEnd(); 
        }
        document.designMode = "off"; 
        goTop(keyWords,0); 
    }

    function goTop(keyWords,findFirst) { 
        if(window.document.location.href = '#') { 
            if(findFirst) { 
                window.find(keyWords, 0, 0, 1);
            }
        }
    }
    </script>

    <style>
    #search_para {
     color:grey;
    }
    .highlight {
     background-color: #FF6; 
    }
    </style>

    <div id="wrapper">
        <input type="text" id="search_text" name="search_text"> &nbsp; 
        <input type="hidden" id="hid_search_text" name="hid_search_text"> 
        <input type="button" value="search" id="search" onclick="highlightAll(document.getElementById('search_text').value)" >  &nbsp; 
        <input type="button" value="remove" id="remove" onclick="removeHighLight()" >  &nbsp; 
        <div>
            <p id="search_para">The European languages are members of the same family. Their separate existence is a myth. For science, music, sport, etc, Europe uses the same vocabulary. The languages only differ in their grammar, their pronunciation and their most common words. Everyone realizes why a new common language would be desirable: one could refuse to pay expensive translators. To achieve this, it would be necessary to have uniform grammar, pronunciation and more common words. If several languages coalesce, the grammar of the resulting language is more simple and regular than that of the individual languages. The new common language will be more simple and regular than the existing European languages.</p>
        </div>
    </div>

There are 2 problems here. One is the nested content problem, or search matches that span an element boundary. The other is HTML-escaped characters.

One way to handle the HTML-escaped characters is, if you are using jQuery for example, to use the .text() method, and run the search on that. The text that comes back from that already has the escaped characters "translated" into their real character.

Another way to handle those special characters would be to replace the actual character (in the search string) with the escaped version. Since there are a wide variety of possibilities there, however, that could be a lengthy search depending on the implementation.

The same sort of "text" method can be used to find content matches that span entity boundaries. It gets trickier because the "Text" doesn't have any notion of where the actual parts of the content come from, but it gives you a smaller domain to search over if you drill in. Once you are close, you can switch to a more "series of characters" sort of search rather than a word-based search.

I don't know of any libraries that do this however.

Just press F3 and use the <p> and </p> command to tell others on your site. For example:You have the knowledge of the F3 search button so to put text on the screen to tell others you would type..

<p><h4>If your having trouble finding something press F3 to highlight the text<h4></p>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM