簡體   English   中英

在循環中附加/合並到R中的向量

[英]Appending/Merging to a vector in R in a loop

我正在從多個網站上抓取數據,並在此處鏈接,然后嘗試將所有網站合並到一個數據框中。 網站具有重復出現的模式,因此我嘗試在一個位置獲得鏈接,然后遍歷for循環:這是我正在處理的代碼塊:

ingredientsList = c()
links<-paste0("http://www.bbc.co.uk/food/ingredients/by/letter/",letters)
#prints out:
#http://www.bbc.co.uk/food/ingredients/by/letter/a
#http://www.bbc.co.uk/food/ingredients/by/letter/b
#http://www.bbc.co.uk/food/ingredients/by/letter/c and so-on till z
for(i in 1:26){
    session<-html_session(links[i])
    ingredients<-session %>% html_nodes("ol:nth-child(4) a") %>% html_text()
    ingredientsList<-c(ingredientsList,ingredients)
}

結果是IngredientList ,理想情況下應包含從'A'到'Z'的所有成分的列表。 謝謝。

您最好使用list而不是vector並且可以使用lapply直接創建它,如下所示:

library(rvest)
library(stringr)

url <- "http://www.bbc.co.uk/food/ingredients/by/letter/"
urls <- paste0(url, letters)

ingredientsList <- lapply(urls, function(u) { 
  u %>%
    html_session() %>%
    html_nodes("ol:nth-child(4) a") %>%
    html_text() %>%
    str_replace_all(pattern = "\n|Related|\\(\\d\\)|\\s{2,}", replacement  = "") %>% ## clean results (remove space, etc)
    subset(!str_detect(., "^\\s{1}")) 
})

names(ingredientsList) <- LETTERS
str(ingredientsList)
## List of 26
##  $ A: chr [1:33] "Acidulated water" "Ackee" "Acorn squash" "Aduki beans" ...
##  $ B: chr [1:101] "Bacon" "Bagel" "Baguette" "Baked beans" ...
##  $ C: chr [1:174] "Cabbage" "Caerphilly" "Cake" "Calasparra rice" ...
##  $ D: chr [1:31] "Dab" "Daikon" "Damsons" "Dandelion" ...
##  $ E: chr [1:15] "Edam" "Eel" "Egg" "Egg liqueur" ...
##  $ F: chr [1:50] "Farfalle" "Fat" "Fennel" "Fennel seeds" ...
##  $ G: chr [1:53] "Galangal" "Game" "Gammon" "Garam masala" ...
##  $ H: chr [1:30] "Habañero chillies" "Haddock" "Haggis" "Hake" ...
##  $ I: chr [1:5] "Ice cream" "Iceberg lettuce" "Icing" "Icing sugar" ...
##  $ J: chr [1:12] "Jaggery" "Jam" "January King cabbage" "Japanese pumpkin" ...
##  $ K: chr [1:12] "Kabana" "Kale" "Ketchup" "Ketjap manis" ...
##  $ L: chr [1:49] "Lager" "Lamb" "Lamb breast" "Lamb chop" ...
##  $ M: chr [1:76] "Macadamia" "Macaroni" "Macaroon" "Mace" ...
##  $ N: chr [1:14] "Naan bread" "Nachos" "Nashi" "Nasturtium" ...
##  $ O: chr [1:20] "Oatcakes" "Oatmeal" "Oats" "Octopus" ...
##  $ P: chr [1:109] "Paella" "Pak choi" "Palm sugar" "Pancakes" ...
##  $ Q: chr [1:6] "Quail" "Quail's egg" "Quark" "Quatre-épices" ...
##  $ R: chr [1:62] "Rabbit" "Rack of lamb" "Radicchio" "Radish" ...
##  $ S: chr [1:125] "Safflower oil" "Saffron" "Sage" "Salad" ...
##  $ T: chr [1:47] "T-bone steak" "Tabasco" "Taco" "Tagliatelle" ...
##  $ U: chr "Unleavened bread"
##  $ V: chr [1:18] "Vacherin" "Vanilla essence" "Vanilla extract" "Vanilla pod" ...
##  $ W: chr [1:38] "Waffles" "Walnut" "Walnut oil" "Wasabi" ...
##  $ X: chr(0) 
##  $ Y: chr [1:4] "Yam" "Yeast" "Yellow lentil" "Yoghurt"
##  $ Z: chr [1:2] "Zander" "Zest"

或者我們可以通過for循環使用與您類似的方法

n <- length(letters)
ingredientsList <- vector(mode = "list", length = n)
names(ingredientsList) <- LETTERS

for(i in 1:n) {
    session<-html_session(urls[i])
    ingredientsList[[i]] <-session %>% 
                           html_nodes("ol:nth-child(4) a") %>% 
                           html_text()
}

但訣竅是堅持list以保持您的結果。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM