简体   繁体   中英

C# and Internet Explorer automation, accessing the cache

I have an Internet Explorer automation script in c#, it works ok but I want to access a captcha image the captcha link returns a refreshed image every time it is visited, and since the browser has already visited it once visiting it again would mess things up, so I tried to find the image in the browsers cache on the disk with the following code

tempDir = Environment.GetFolderPath(Environment.SpecialFolder.InternetCache).ToString();
System.Console.WriteLine(tempDir);
supstra = element.innerHTML.ToString().Substring(element.innerHTML.ToString().IndexOf("/sorry/image?id="), element.innerHTML.ToString().Length - element.innerHTML.ToString().IndexOf("/sorry/image?id="));
Console.WriteLine("http://www.goolge.com/sorry/image?id=" + element.innerHTML.ToString().Substring(element.innerHTML.ToString().IndexOf("/sorry/image?id="), supstra.IndexOf("&hl=")));
captchas = client.Decode(tempDir + "\\" + element.innerHTML.ToString().Substring(element.innerHTML.ToString().IndexOf("/sorry/image?id=") + 7, supstra.IndexOf("&hl=")).Replace("amp;", "") + "=en", 0);

The image however in the cache directory is not an image but a command or something with the name image?id=....

and all it does is revisit and get new image. What do I have to do it seems is to somehow access the image the browser is showing, which might be only in the memory, how can I do that?

See this thread here on Accessing IE cache in C# .

Specifically, from the question:

Since the Internet Explorer is already displaying the webpage, the images in the webpage must already be stored somewhere in local cache

And the answer (emphasis mine):

You want to use GetUrlCacheEntryInfo() .

Use the lpszLocalFileName of the INTERNET_CACHE_ENTRY_INFO structure upon return from the function.

Furthermore, one of your premises is flawed. Sometimes IE only has an in-memory representation of the image and the item on disk has been deleted . This is the case if, for example, the no-cache directive has been set. Or the user has cleared their cache but not navigated from the page. Or the scavenger has deleted it but the user hasn't navigated. There are probably 5 to 7 other scenarios as well.

In the past when I've had to do something similar, I force the web browser (IE in this case) to use something like Fiddler2 as a proxy. In Fiddler2, I can then intercept the image requests for a particular URL and use C# to save them to disk in a known location. The automation program can then grab them from there.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM