Abot Crawler - How to detect null response

Question

I am using vb.net and have a handful of URLs that refuse to be crawled. I would really like to detect when a crawl returns a null response, but seem to be having a problem figuring out HOW.

Code:

Public Sub crawler_ProcessPageCrawlCompleted(sender As Object, e As PageCrawlCompletedArgs)

    pageNumber += 1
    Try

        Dim crawledPage As CrawledPage = e.CrawledPage


        If (Not (crawledPage.HttpWebResponse Is Nothing) And Not (crawledPage.WebException Is Nothing)) Or crawledPage.HttpWebResponse.StatusCode <> HttpStatusCode.OK Then
            CrawlFailed(e.CrawledPage.ToString, Failed)
        Else

            If String.IsNullOrEmpty(crawledPage.Content.Text) Then
                CrawlFailed(e.CrawledPage.ToString, NoContent)
            Else
                StoreContent(e)
            End If

        End If


    Catch ex As Exception
        RichTextBox1.AppendText(e.CrawledPage.ToString & " - " & ex.Message & vbCrLf)
    End Try

End Sub

I put in the Catch-Try to capture that exception, but I would really rather capture it in my CrawlFailed subroutine to do something with that URL.

I have tried to figure out how to use GetResponseStream and Stream.Null, but can't seem to figure out how to detect an empty stream :( I'm just missing something, but I've googled all over the place and the best I can find is this thread: crawledPage.HttpWebResponse is null in Abot .

However - that doesn't really explain HOW to detect and code against the result.

Answer 1

I had the same issue (dotnet core), with a fiddler session I could see the response actually did come. But I also saw it took a long time for the site to return result.

Try setting config.HttpRequestTimeoutInSeconds to a higher value. It resolved my issues.

Abot Crawler - How to detect null response

Question

1 answers

solution1
0 2017-01-18 18:37:44

Abot Crawler - How to detect null response

Question

1 answers

solution1 0 2017-01-18 18:37:44

solution1
0 2017-01-18 18:37:44