简体   繁体   English

使用F#和Canopy进行现场抓取

[英]Site scraping with F# and Canopy

I am trying to write a simple scraper using F# and Canopy (see http://lefthandedgoat.github.io/canopy/ ). 我正在尝试使用F#和Canopy编写一个简单的刮刀(参见http://lefthandedgoat.github.io/canopy/ )。 I am trying to extract text from all element with the class ".application-tile". 我试图从类“.application-tile”中提取所有元素的文本。 However, in the code below, I get the following build error and I don't understand it. 但是,在下面的代码中,我得到以下构建错误,我不明白。

This expression was expected to have type
    OpenQA.Selenium.IWebElement -> 'a    
but here has type
    OpenQA.Selenium.IWebElement

Any idea why this is happening? 知道为什么会这样吗? Thanks! 谢谢!

open canopy
open runner
open System

[<EntryPoint>]
let main argv = 
    start firefox

    "taking canopy for a spin" &&& fun _ ->
        url "https://abc.com/"

        // Login Page
        "#i0116" << "abc@abc.com"
        "#i0118" << "abc"
        click "#abcButton"

        // Get the Application Tiles -- BUILD ERROR HAPPENS HERE
        elements ".application-tile" |> List.map (fun tile -> (tile |> (element ".application-name breakWordWrap"))) |> ignore

    run()
open canopy
open runner

start firefox

"taking canopy for a spin" &&& fun _ ->
    url "http://lefthandedgoat.github.io/canopy/testpages/"

    // Get the tds in tr
    let results = elements "#value_list td" |> List.map read

    //or print them using iter
    elements "#value_list td" 
        |> List.iter (fun element -> System.Console.WriteLine(read element))

run()

That should do what you want. 那应该做你想要的。

canopy has function called 'read' that takes in either a selector or an element. 冠层具有称为“读取”的功能,它接收选择器或元素。 Since you have all of them from 'elements "selector"' you can map read over the list. 由于您拥有来自“elements”选择器“'的所有内容,因此您可以在列表中映射读取。

List.map takes in a function, runs it, and returns a list of results. List.map接受一个函数,运行它,并返回一个结果列表。 (in C# its like elements.Select(x => read(x)) List.iter is the same as .foreach(x => System.Console.Writeline(read(x)) (在C#中它的like元素.Select(x => read(x))List.iter与.foreach相同(x => System.Console.Writeline(read(x))

I believe that the error is happening in the projection lambda inside your List.map call. 我相信错误发生在List.map调用中的投影lambda中。 From the canopy documentation elements returns all elements that match css selector or text. 从冠层文档elements返回与css选择器或文本匹配的所有元素。 element gets an element with given css selectors or text. element使用给定的css选择器或文本获取元素。

So here you are obtaining a list of Elements that match the selector ".application-tile". 所以在这里您将获得与选择器“.application-tile”匹配的元素列表。 List.map requires a lambda that takes an IElement (the type contained in elements) that will project it into a new form (the generic 'a). List.map需要一个lambda,它接受一个I​​Element(元素中包含的类型),将它投影到一个新的形式(通用的'a)。

I don't know much about this framework but I'm not sure why you're taking an element and then piping it into another call to element. 我不太了解这个框架,但我不确定你为什么要使用一个元素然后将它传递给另一个元素调用。

Looking further through the documentation we find the read function: "Read the text (or value or selected option) of an element." 通过文档进一步查看我们发现的读取函数:“读取元素的文本(或值或选定选项)。” Is this what you want? 这是你想要的吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM