简体   繁体   中英

How can I read Chrome Cache files?

A forum I frequent was down today, and upon restoration, I discovered that the last two days of forum posting had been rolled back completely.

Needless to say, I'd like to get back what data I can from the forum loss, and I am hoping I have at least some of it stored in the cache files that Chrome created.

I face two problems -- the cache files have no filetype, and I'm unsure how to read them in an intelligent manner (trying to open them in Chrome itself seems to "redownload" them in a.gz format), and there are a ton of cache files.

Any suggestions on how to read and sort these files? (A simple string search should fit my needs)

EDIT : The below answer no longer works see here


In Chrome or Opera, open a new tab and navigate to chrome://view-http-cache/

Click on whichever file you want to view. You should then see a page with a bunch of text and numbers. Copy all the text on that page. Paste it in the text box below.

Press "Go". The cached data will appear in the Results section below.

Try Chrome Cache View from NirSoft (free).

EDIT : The below answer no longer works see here


Chrome stores the cache as a hex dump. OSX comes with xxd installed, which is a command line tool for converting hex dumps. I managed to recover a jpg from my Chrome's HTTP cache on OSX using these steps:

  1. Goto: chrome://cache
  2. Find the file you want to recover and click on it's link.
  3. Copy the 4th section to your clipboard. This is the content of the file.
  4. Follow the steps on this gist to pipe your clipboard into the python script which in turn pipes to xxd to rebuild the file from the hex dump: https://gist.github.com/andychase/6513075

Your final command should look like:

pbpaste | python chrome_xxd.py | xxd -r - image.jpg

If you're unsure what section of Chrome's cache output is the content hex dump take a look at this page for a good guide: http://www.sparxeng.com/blog/wp-content/uploads/2013/03/chrome_cache_html_report.png

Image source: http://www.sparxeng.com/blog/software/recovering-images-from-google-chrome-browser-cache

More info on XXD: http://linuxcommand.org/man_pages/xxd1.html

Thanks to Mathias Bynens above for sending me in the right direction.

EDIT : The below answer no longer works see here


If the file you try to recover has Content-Encoding: gzip in the header section, and you are using linux (or as in my case, you have Cygwin installed) you can do the following:

  1. visit chrome://view-http-cache/ and click the page you want to recover
  2. copy the last (fourth) section of the page verbatim to a text file (say: a.txt)
  3. xxd -r a.txt| gzip -d

Note that other answers suggest passing -p option to xxd - I had troubles with that presumably because the fourth section of the cache is not in the "postscript plain hexdump style" but in a "default style".

It also does not seem necessary to replace double spaces with a single space, as chrome_xxd.py is doing (in case it is necessary you can use sed 's/ / /g' for that).

Note: The flag show-saved-copy has been removed and the below answer will not work


You can read cached files using Chrome alone.

Chrome has a feature called Show Saved Copy Button:

Show Saved Copy Button Mac, Windows, Linux, Chrome OS, Android

When a page fails to load, if a stale copy of the page exists in the browser cache, a button will be presented to allow the user to load that stale copy. The primary enabling choice puts the button in the most salient position on the error page; the secondary enabling choice puts it secondary to the reload button. #show-saved-copy

First disconnect from the Internet to make sure that browser doesn't overwrite cache entry. Then navigate to chrome://flags/#show-saved-copy and set flag value to Enable: Primary . After you restart browser Show Saved Copy Button will be enabled. Now insert cached file URI into browser's address bar and hit enter. Chrome will display There is no Internet connection page alongside with Show saved copy button: 在此处输入图像描述

After you hit the button browser will display cached file.

I've made short stupid script which extracts JPG and PNG files:

#!/usr/bin/php
<?php
 $dir="/home/user/.cache/chromium/Default/Cache/";//Chrome or chromium cache folder. 
 $ppl="/home/user/Desktop/temporary/"; // Place for extracted files 

 $list=scandir($dir);
 foreach ($list as $filename)
 {

 if (is_file($dir.$filename))
    {
        $cont=file_get_contents($dir.$filename);
        if  (strstr($cont,'JFIF'))
        {
            echo ($filename."  JPEG \n");
            $start=(strpos($cont,"JFIF",0)-6);
            $end=strpos($cont,"HTTP/1.1 200 OK",0);
            $cont=substr($cont,$start,$end-6);
            $wholename=$ppl.$filename.".jpg";
            file_put_contents($wholename,$cont);
            echo("Saving :".$wholename." \n" );


                }
        elseif  (strstr($cont,"\211PNG"))
        {
            echo ($filename."  PNG \n");
            $start=(strpos($cont,"PNG",0)-1);
            $end=strpos($cont,"HTTP/1.1 200 OK",0);
            $cont=substr($cont,$start,$end-1);
            $wholename=$ppl.$filename.".png";
            file_put_contents($wholename,$cont);
            echo("Saving :".$wholename." \n" );


                }
        else
        {
            echo ($filename."  UNKNOWN \n");
        }
    }
 }
?>

I had some luck with this open-source Python project, seemingly inactive: https://github.com/JRBANCEL/Chromagnon

I ran:

python2 Chromagnon/chromagnonCache.py path/to/Chrome/Cache -o browsable_cache/

And I got a locally-browsable extract of all my open tabs cache.

The Google Chrome cache directory $HOME/.cache/google-chrome/Default/Cache on Linux contains one file per cache entry named <16 char hex>_0 in "simple entry format" :

  • 20 Byte SimpleFileHeader
  • key (ie the URI)
  • payload (the raw file content ie the PDF in our case)
  • SimpleFileEOF record
  • HTTP headers
  • SHA256 of the key (optional)
  • SimpleFileEOF record

If you know the URI of the file you're looking for it should be easy to find. If not, a substring like the domain name, should help narrow it down. Search for URI in your cache like this:

fgrep -Rl '<URI>' $HOME/.cache/google-chrome/Default/Cache

Note: If you're not using the default Chrome profile, replace Default with the profile name, eg Profile 1 .

It was removed on purpose and it won't be coming back.

Both chrome://cache and chrome://view-http-cache have been removed starting chrome 66. They work in version 65.

Workaround

You can check the chrome://chrome-urls/ for complete list of internal Chrome URLs.

The only workaround that comes into my mind is to use menu/more tools/developer tools and having a Network tab selected.

The reason why it was removed is this bug:

The discussion:

The JPEXS Free Flash Decompiler has Java code to do this at in the source tree for both Chrome and Firefox (no support for Firefox's more recent cache2 though).

EDIT : The below answer no longer works see here


Google Chrome cache file format description .

Cache files list, see URLs (copy and paste to your browser address bar):

  • chrome://cache/
  • chrome://view-http-cache/

Cache folder in Linux: $~/.cache/google-chrome/Default/Cache

Let's determine in file GZIP encoding:

$ head f84358af102b1064_0 | hexdump -C | grep --before-context=100 --after-context=5 "1f 8b 08"

Extract Chrome cache file by one line on PHP (without header, CRC32 and ISIZE block):

$ php -r "echo gzinflate(substr(strchr(file_get_contents('f84358af102b1064_0'), \"\x1f\x8b\x08\"), 10,
-8));"

Note: The below answer is out of date since the Chrome disk cache format has changed.


Joachim Metz provides some documentation of the Chrome cache file format with references to further information.

For my use case, I only needed a list of cached URLs and their respective timestamps. I wrote a Python script to get these by parsing the data_* files under C:\Users\me\AppData\Local\Google\Chrome\User Data\Default\Cache\ :

import datetime
with open('data_1', 'rb') as datafile:
    data = datafile.read()

for ptr in range(len(data)):
    fourBytes = data[ptr : ptr + 4]
    if fourBytes == b'http':

        # Found the string 'http'. Hopefully this is a Cache Entry
        endUrl = data.index(b'\x00', ptr)
        urlBytes = data[ptr : endUrl]
        try:
            url = urlBytes.decode('utf-8')
        except:
            continue

        # Extract the corresponding timestamp
        try:
            timeBytes = data[ptr - 72 : ptr - 64]
            timeInt = int.from_bytes(timeBytes, byteorder='little')
            secondsSince1601 = timeInt / 1000000
            jan1601 = datetime.datetime(1601, 1, 1, 0, 0, 0)
            timeStamp = jan1601 + datetime.timedelta(seconds=secondsSince1601)
        except:
            continue

        print('{} {}'.format(str(timeStamp)[:19], url))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM