I'm new to R and I'm trying to get data from this website: https://spritacular.org/gallery .
I want to get the location, time and the hour. I am following this guide , using the SelectorGadget I clicked on the elements I wanted (.card-title, .card-subtitle, .mb-0).
However, it always outputs {xml_nodeset (0)} and I'm not sure why it's not getting those elements.
This is the code I have:
url <- "https://spritacular.org/gallery"
sprite_gallery <- read_html(url)
sprite_location <- html_nodes(sprite_gallery, ".card-title , .card-subtitle , .mb-0")
sprite_location
When I change the website and grab something from a different website it works, so I'm not sure what I'm doing wrong and how to fix it, this is my first time doing something like this and I appreciate any insight you may have!
As per comment, this website has JS embedded and the information only opens when a browser is opened. If you go to developers tools and.network tab, you can see the underlying json data If you post a
GET
request for this api address, you will get a list back with all the results. From their, you can slice and dice your way to get the required information you need.
One way to do this: I have considered the name of the user who submitted the image and I found out that same user has submitted multiple images. Hence there are duplicate names and locations in the output but the image URL is different. Refer this blog to know how to drill down the json data to make useful dataframes in R
library(httr)
library(tidyverse)
getURL <- 'https://api.spritacular.org/api/observation/gallery/?category=&country=&cursor=cD0xMTI%3D&format=json&page=1&status='
# get the raw json into R
UOM_json <- httr::GET(getURL) %>%
httr::content()
exp_output <- pluck(UOM_json, 'results') %>%
enframe() %>%
unnest_longer(value) %>%
unnest_wider(value) %>%
select(user_data, images) %>%
unnest_wider(user_data) %>%
mutate(full_name = paste(first_name, last_name)) %>%
select(full_name, location, images) %>%
rename(., location_user = location) %>%
unnest_longer(images) %>%
unnest_wider(images) %>%
select(full_name, location, image)
Output of our exp_output
> head(exp_output)
# A tibble: 6 × 3
full_name location image
<chr> <chr> <chr>
1 Kevin Palivec Jones County,Texas,United States https://d1dzduvcvkxs60.cloudfront.net/observation_image/1d4cc82f-f3d2…
2 Kamil Świca Lublin,Lublin Voivodeship,Poland https://d1dzduvcvkxs60.cloudfront.net/observation_image/3b6391d1-f839…
3 Kamil Świca Lublin,Lublin Voivodeship,Poland https://d1dzduvcvkxs60.cloudfront.net/observation_image/9bcf10d7-bd7c…
4 Kamil Świca Lublin,Lublin Voivodeship,Poland https://d1dzduvcvkxs60.cloudfront.net/observation_image/a7dea9cf-8d6e…
5 Evelyn Lapeña Bulacan,Central Luzon,Philippines https://d1dzduvcvkxs60.cloudfront.net/observation_image/539e0870-c931…
6 Evelyn Lapeña Bulacan,Central Luzon,Philippines https://d1dzduvcvkxs60.cloudfront.net/observation_image/c729ea03-e1f8…
>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.