简体   繁体   中英

How to loop through year and month in a R function

I have a web scraping function which takes the year(term) and month(term2) and returns the news headlines as a dataframe:

news <- function(term, term2) {
  
  html_dat <- read_html(paste0("https://news.google.com/search?q=site%3%2F",term,"%2F",term2,"&hl=en-US&gl=US&ceid=US%3Aen"))

  news_dat <- data.frame(
    Title = html_dat %>%
      html_nodes("a.DY5T1d") %>% 
      html_text()
  ) 

return(news_dat)
}

df <- news('2020', '05')

I would like to create a loop which where it takes the argument from Year 2000 to 2021 and for each month. For example the loop will take in argument news('2000', '01') then iterate to news('2000', '02').

I would like to return the dataframe/list for all the headlines in above time line. The code I have which does not work:


years <- 2000:2021
months <- 1:12

for (i in length(years)){
  for (j in length(months)){
    temp <- news(i,j)
  }
  newdf <- rbind(newdf, temp)
}

You could use map2_dfr from the purrr package for this.

library(textreadr)
library(purrr)
library(rvest)

news <- function(term, term2) {

  url <-paste0("https://news.google.com/search?q=site%3%2F",term,"%2F",term2,"&hl=en-US&gl=US&ceid=US%3Aen")
  html_dat <- read_html(url)

  news_dat <- data.frame(
    Title = html_dat %>%
      html_nodes("a.DY5T1d") %>%
      html_text()
  )

}

years <- 2000:2021
months <- 1:12

crossArg <-cross_df(list(year=years, month=months))

df <- map2_dfr(crossArg$year, crossArg$month, news)

The important thing to remember is that R is designed to work on columns. For example, to add 1 to every element of a vector, it's sufficient to write

x <- 1:10
x <- x + 1

which gives a vector whose first element is 2 and last element is 11 . So, when you find yourself writing code to loop through rows of a vector/matrix/data frame, stop. There is almost certainly a better way*.

*: There are rare, very rare, exceptions. This is not one of them.

library(tidyverse)

newdf <- tibble() %>% expand(year=2000:2021, month=1:12)
newdf
# A tibble: 264 x 2
    year month
   <int> <int>
 1  2000     1
 2  2000     2
 3  2000     3
 4  2000     4
 5  2000     5
 6  2000     6
 7  2000     7
 8  2000     8
 9  2000     9
10  2000    10
# … with 254 more rows

which I believe is what you want.

Edit To forestall OP's request for conversion to character, at which they hint in one of their comments:

newdf <- tibble() %>% 
           expand(year=2000:2021, month=1:12) %>% 
           mutate(year=as.character(year), month=as.character(month))

or

newdf <- tibble() %>% 
           expand(year=as.character(2000:2021), month=as.character(1:12))

Though I believe this is unnecessary.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM