简体   繁体   中英

Abot Crawler - How to detect null response

I am using vb.net and have a handful of URLs that refuse to be crawled. I would really like to detect when a crawl returns a null response, but seem to be having a problem figuring out HOW.

Code:

Public Sub crawler_ProcessPageCrawlCompleted(sender As Object, e As PageCrawlCompletedArgs)

    pageNumber += 1
    Try

        Dim crawledPage As CrawledPage = e.CrawledPage


        If (Not (crawledPage.HttpWebResponse Is Nothing) And Not (crawledPage.WebException Is Nothing)) Or crawledPage.HttpWebResponse.StatusCode <> HttpStatusCode.OK Then
            CrawlFailed(e.CrawledPage.ToString, Failed)
        Else

            If String.IsNullOrEmpty(crawledPage.Content.Text) Then
                CrawlFailed(e.CrawledPage.ToString, NoContent)
            Else
                StoreContent(e)
            End If

        End If


    Catch ex As Exception
        RichTextBox1.AppendText(e.CrawledPage.ToString & " - " & ex.Message & vbCrLf)
    End Try

End Sub

I put in the Catch-Try to capture that exception, but I would really rather capture it in my CrawlFailed subroutine to do something with that URL.

I have tried to figure out how to use GetResponseStream and Stream.Null, but can't seem to figure out how to detect an empty stream :( I'm just missing something, but I've googled all over the place and the best I can find is this thread: crawledPage.HttpWebResponse is null in Abot .

However - that doesn't really explain HOW to detect and code against the result.

I had the same issue (dotnet core), with a fiddler session I could see the response actually did come. But I also saw it took a long time for the site to return result.

Try setting config.HttpRequestTimeoutInSeconds to a higher value. It resolved my issues.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM