I have developed a Web application in VS2008. It works perfectly on my development PC. When I publish and upload to the shared Windows hosting service (which supports ASP.NET 3.5), it fails (even when accessing it from my development PC). The error message is:
Could not load file or assembly 'Microsoft.mshtml, Version=7.0.3300.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. The system cannot find the file specified.
I have read many forum posts on the subject, and have tried the recommended solutions:
I know this issue has been covered before, but the suggested solutions just don't work. Does anyone have any insight?
TIA
If you're trying to parse HTML, instead of MSHTML, try the HTMLAgilityPack, or one of the other suggestions mentioned in this question
Again, thanks Jason. HTMLAgilityPack did the trick.
In the interest of helping others, I'll post a few code snippets that I found useful (since documentation on the product is sparse).
1) IN YOUR ASP.NET APPLICATION, COPY HtmlAgilityPack.dll AND HtmlAgilityPack.XML INTO YOUR BIN FOLDER.
Check to verify that it is registered by right-clicking the top line in Solution Explorer and viewing 'Property Pages'. If HtmlAgilityPack is not already in your References, click [AddDownArrow], Add Reference, Bin, HtmlAgilityPack, OK.
2) CAPTURE A WEB PAGE AND CONVERT IT TO AN HTML DOC:
Adapted from EggheadCafe's excellent Asynchronous Task example :
Public Function OnBegin(...)
vRequest = WebRequest.Create("http://www.stackoverflow.com")
Return vRequest.BeginGetResponse(cb, extraData)
End Function
Public Sub OnEnd(...)
Private vPage_Text As String = ""
Private vPage_Doc As New HtmlAgilityPack.HtmlDocument
Using response As WebResponse = vRequest.EndGetResponse(ar)
Using reader As StreamReader = New StreamReader(response.GetResponseStream())
vPage_Text = reader.ReadToEnd()
vPage_Doc.LoadHtml(vPage_Text)
End Using
End Using
End Sub
3) EXTRACT THE ENTIRE HTML DOCUMENT:
vText = vPage_Doc.DocumentNode.OuterHtml
4) EXAMINE EVERY LINK IN THE DOC AND COLLECT THE URLs:
For Each vLinkNode As HtmlAgilityPack.HtmlNode In vPage_Doc.DocumentNode.SelectNodes(".//a")
vLinkList = vLinkList & vLinkNode.GetAttributeValue("href", "") & vbCrLf
Next
5) EXAMINE EVERY WITH CSS class="item_class" AND COLLECT THE TEXT:
For Each vDivNode As HtmlAgilityPack.HtmlNode In vPage_Doc.DocumentNode.SelectNodes(".//div[@class='item_class']")
vPageText = vPageText & vDivNode.InnerText & vbCrLf
Next
6) EXTRACT THE DOC'S TITLE AND DESCRIPTION:
Dim vTitleNode As HtmlAgilityPack.HtmlNode = vPage_Doc.DocumentNode.SelectSingleNode(".//title")
vTitleText = vTitleNode.InnerText
Dim vDescriptionNode As HtmlAgilityPack.HtmlNode = vPage_Doc.DocumentNode.SelectSingleNode(".//meta[@name='description']")
vDescriptionText = vDescriptionNode.InnerText
Or the Title in the doc's body:
vBodyTitle = vPage_Doc.DocumentNode.SelectSingleNode(".//h1")
7) EXTRACT AN ELEMENT BY ITS ID:
Dim vBigImageNode As HtmlAgilityPack.HtmlNode = vPage_Doc.GetElementbyId("BigImage")
vImage_URL = vBigImageNode.GetAttributeValue("src", "")
vImage_Height = vBigImageNode.GetAttributeValue("height", "")
vImage_Width = vBigImageNode.GetAttributeValue("width", "")
8) REMOVE A NODE:
vMovieNode.SelectSingleNode(".//div[@class='viewer-reviews']").Remove()
Finally, I had the need to extract a subsection of a page when there were no obvious nodes or other 'attachment points'. The trick is to identify anything that you can 'find' (such as a tag or comment) that can be used as a dividing point in an already-selected node of the doc. Then insert insert ending and beginning tags, thus creating 2 separate subsections withing the node. Finally, create a new HTML doc from the edited node and select the newly-defined node. (If you didn't understand all of that, just follow the code.)
So here is the top-secret, never-before released,
9) EXTRACT ANY PORTION OF A DOCUMENT:
Dim vNewDoc As New HtmlAgilityPack.HtmlDocument
vNewDoc.LoadHtml(vOldDivNode.OuterHtml.Substring(0, vOldDivNode.OuterHtml.IndexOf("<!-- comment") - 1) & _
"</div><div class=""my_new_node"">" & _
vOldDivNode.OuterHtml.Substring(vOldDivNode.OuterHtml.IndexOf("<!-- comment") - 1))
Dim vNewDivNode = vNewDoc.SelectSingleNode(".//div[@class='my_new_node']")
Dim vHaHaICapturedYou As String = vNewDivNode.InnerText
Of course, now that I've told you, I'm gonna have to kill you.
Thanks to all of the contributors to Stack Overflow for all of the help you've given me!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.