I'm new in web scraping using R. I'm trying to scrape the table generated by this link: https://gd.eppo.int/search?k=saperda+tridentata . In this specific case, it's just one record in the table but it could be more (I am actually interested in the first column but the whole table is ok).
I tried to follow the suggestion by Allan Cameron given here ( rvest, table with thead and tbody tags ) as the issue seems to be exactly the same but with no success maybe for my little knowledge on how webpages work. I always get a "no data" table. Maybe I am not following correctly the suggested step "# Get the JSON as plain text from the link generated by Javascript on the page". Where can I get this link? In this specific case I used " https://gd.eppo.int/media/js/application/zzsearch.js?7 ", is this one?
Below you have my code. Thank you in advance!
library(httr)
library(rlist)
library(rvest)
library(jsonlite)
library(dplyr)
pest.name <- "saperda+tridentata"
url <- paste("https://gd.eppo.int/search?k=",pest.name, sep="")
resp <- GET(url) %>% content("text")
json_url <- "https://gd.eppo.int/media/js/application/zzsearch.js?7"
JSON <- GET(json_url) %>% content("text", encoding = "utf8")
table_contents <- JSON %>%
{gsub("\\\\n", "\n", .)} %>%
{gsub("\\\\/", "/", .)} %>%
{gsub("\\\\\"", "\"", .)} %>%
strsplit("html\":\"") %>%
unlist %>%
extract(2) %>%
substr(1, nchar(.) -2) %>%
paste0("</tbody>")
new_page <- gsub("</tbody>", table_contents, resp)
read_html(new_page) %>%
html_nodes("table") %>%
html_table()
The data comes from another endpoint you can see in the network tab when refreshing the page. You can send a request with your search phrase in the params and then extract the json you need from the response.
library(httr)
library(jsonlite)
params = list('k' = 'saperda tridentata','s' = 1,'m' = 1,'t' = 0)
r <- httr::GET(url = 'https://gd.eppo.int/ajax/search', query = params)
data <- jsonlite::parse_json(r %>% read_html() %>% html_node('p') %>%html_text())
print(data[[1]]$e)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.