简体   繁体   中英

Web scraping: Extract text in R using RVEST

I'm doing work for college using R, how I can extract information " | 20 de Novembro de 2015 " using RVEST package? I tried to get the class "widget-info" but brought a "widget-author" class also

<div class="home-list-content">
            <span class="widget-info">
                <span class="widget-author">
                    Rúben Campanacho
                </span> 
                | 20 de Novembro de 2015
            </span>
            <h2>
                LG Pay é o sistema de pagamentos móveis da LG
            </h2>
        </div>

My code:

pagina <- read_html("http://www.tecnologia.com.pt")
    data <- pagina %>% 
      html_nodes(".widget-info") %>%
      html_text() %>%
      as.data.frame()

The result:

Rúben Campanacho | 20 de Novembro de 2015

I want just | 20 de Novembro de 2015

txt <- 'Rúben Campanacho | 20 de Novembro de 2015'

gsub('^((\\w+)[[:space:]]){2}', '', txt)

Returns:

"| 20 de Novembro de 2015"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM