在 R 中創建網絡抓取循環

Question

我正在嘗試在 R 中設置一個網絡抓取循環，但我真的很難創建一個有效的循環。

我目前有一個 excel 文件，其中包含我想抓取的相關 URL。 我將其讀入 R 並嘗試使用網絡抓取來提取當前位於標題為“DE”的列中的每個 URL 的產品標題。 該表的一個簡短示例是：

我一直在使用的代碼是：

library(readxl)
URL_creator <- read_excel("URL creator.xlsx")

library(rvest)
content_list <- vector()
for (i in 1:nrow(URL_creator)) {
  url <- URL_creator[i,]$DE
  html <- read_html(url)
  nodes <- html_nodes(html, "productTitle") %>% 
    html_text() %>% 
    gsub("\n", "", .) %>% 
    trimws()
  {    content_list[i] <- nodes[1]
    }}

由於某種原因，返回的內容列表是空白的。 我希望它會在標題為“DE”的列中返回每個相應 URL 的產品標題，但我不確定我哪里出錯了。

任何幫助是極大的贊賞：）

謝謝！

Answer 1

我對您的代碼進行了一些更改並且工作正常。 一探究竟：

for (i in URL_creator$DE) {
  html <- read_html(i)
  nodes <- html_nodes(html, "title") %>% 
    html_text() %>% 
    gsub("\n", "", .) %>% 
    trimws()
  {    content_list[i] <- nodes[1]
    }}

content_list[1]的 output 是：

https://amazon.de/dp/B0821PBSPJ “Planet Waves D'Addario 10MB00 Mandolinengurt, geflochten, 2,5 cm, Braun/cremefarben: Amazon.de: Musikinstrumente & DJ-Equipment”

在 R 中創建網絡抓取循環

問題描述

1 個解決方案

解決方案1
1 2021-12-23 09:53:34

在 R 中創建網絡抓取循環

問題描述

1 個解決方案

解決方案1 1 2021-12-23 09:53:34

解決方案1
1 2021-12-23 09:53:34