简体   繁体   中英

How do I get the original innerHTML source without the Javascript generated contents?

Is it possible to get in some way the original HTML source without the changes made by the processed Javascript? For example, if I do:

<div id="test">
    <script type="text/javascript">document.write("hello");</script>
</div>

If I do:

alert(document.getElementById('test').innerHTML);

it shows:

<script type="text/javascript">document.write("hello");</script>hello

In simple terms, I would like the alert to show only:

<script type="text/javascript">document.write("hello");</script>

without the final hello (the result of the processed script).

I don't think there's a simple solution to just "grab original source" as it'll have to be something that's supplied by the browser. But, if you are only interested in doing this for a section of the page , then I have a workaround for you.

You can wrap the section of interest inside a "frozen" script:

<script id="frozen" type="text/x-frozen-html">

The type attribute I just made up, but it will force the browser to ignore everything inside it. You then add another script tag (proper javascript this time) immediately after this one - the "thawing" script. This thawing script will get the frozen script by ID, grab the text inside it, and do a document.write to add the actual contents to the page. Whenever you need the original source, it's still captured as text inside the frozen script.

And there you have it. The downside is that I wouldn't use this for the whole page... (SEO, syntax highlighting, performance...) but it's quite acceptable if you have a special requirement on part of a page.


Edit: Here is some sample code. Also, as @FlashXSFX correctly pointed out, any script tags within the frozen script will need to be escaped. So in this simple example, I'll make up a <x-script> tag for this purpose.

<script id="frozen" type="text/x-frozen-html">
   <div id="test">
      <x-script type="text/javascript">document.write("hello");</x-script>
   </div>
</script>
<script type="text/javascript">
   // Grab contents of frozen script and replace `x-script` with `script`
   function getSource() {
      return document.getElementById("frozen")
         .innerHTML.replace(/x-script/gi, "script");
   }
   // Write it to the document so it actually executes
   document.write(getSource());
</script>

Now whenever you need the source:

alert(getSource());

See the demo: http://jsbin.com/uyica3/edit

A simple way is to fetch it form the server again. It will be in the cache most probably. Here is my solution using jQuery.get() . It takes the original uri of the page and loads the data with an ajax call:

$.get(document.location.href, function(data,status,jq) {console.log(data);})

This will print the original code without any javascript. It does not do any error handling!

If don't want to use jQuery to fetch the source, consult the answer to this question: How to make an ajax call without jquery?

Could you send an Ajax request to the same page you're currently on and use the result as your original HTML? This is foolproof given the right conditions, since you are literally getting the original HTML document. However, this won't work if the page changes on every request (with dynamic content), or if, for whatever reason, you cannot make a request to that specific page.

Brute force approach

var orig = document.getElementById("test").innerHTML;
alert(orig.replace(/<\/script>[.\n\r]*.*/i,"</script>"));

EDIT:

This could be better

var orig = document.getElementById("test").innerHTML + "<<>>";
alert(orig.replace( /<\/script>[^(<<>>)]+<<>>/i, "<\/script>"));

If you override document.write to add some identifiers at the beginning and end of everything written to the document by the script, you will be able to remove those writes with a regular expression.

Here's what I came up with:

    <script type="text/javascript" language="javascript">
        var docWrite = document.write;
        document.write = myDocWrite;

        function myDocWrite(wrt) {
            docWrite.apply(document, ['<!--docwrite-->' + wrt + '<!--/docwrite-->']);
        }
    </script>

Added your example somewhere in the page after the initial script:

    <div id="test">
        <script type="text/javascript">     document.write("hello");</script>
    </div>

Then I used this to alert what was inside:

    var regEx = /<!--docwrite-->(.*?)<!--\/docwrite-->/gm;
    alert(document.getElementById('test').innerHTML.replace(regEx, ''));

If you want the pristine document, you'll need to fetch it again. There's no way around that. If it weren't for the document.write() (or similar code that would run during the load process) you could load the original document's innerHTML into memory on load/domready, before you modify it.

I can't think of a solution that would work the way you're asking. The only code that Javascript has access to is via the DOM, which only contains the result after the page has been processed.

The closest I can think of to achieve what you want is to use Ajax to download a fresh copy of the raw HTML for your page into a Javascript string, at which point since it's a string you can do whatever you like with it, including displaying it in an alert box.

A tricky way is using <style> tag for template. So that you do not need rename x-script any more.

 console.log(document.getElementById('test').innerHTML); 
 <style id="test" type="text/html+template"> <script type="text/javascript">document.write("hello");</script> </style> 

But I do not like this ugly solution.

I think you want to traverse the DOM nodes:

var childNodes = document.getElementById('test').childNodes, i, output = [];

for (i = 0; i < childNodes.length; i++)
    if (childNodes[i].nodeName == "SCRIPT")
        output.push(childNodes[i].innerHTML);

return output.join('');

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM