How to scrape data from a website and write to a csv in a specified format in R?

Question

I am trying to scrape data from https://www.booking.com/country.html .

The idea is to extract all numbers regarding any kind of accommodation listed for a particular country.

The output needs to have the list of all countries in 'column A' of an Excel file and the relevant number of listings for different property types (Ex. Apartments, Hostels, Resorts, etc.) in each respective country adjacent to the country names in separate columns.

I need to capture all the details for all the property types for a given country.

The above image describes the output format required in Excel. I am able to get the country using the below code but not the property types and their respective data.

How to get the data iteratively in function for all the countries and write in a csv.

library(rvest)
library(reshape2)
library(stringr)

url <- "https://www.booking.com/country.html"

bookingdata <- read_html(url)

#extracting the country
country <- html_nodes(bookingdata, "h2 > a") %>% 
  html_text()
write.csv(country, 'D:\\web scraping\\country.csv' ,row.names = FALSE)
print(country)

#extracting the data inside the inner div 
html_nodes(bookingdata, "div >div > div > ul > li > a")%>%
  html_text()
for (i in country) {
print(i)
html_nodes(pg, "ul > li > a") %>% 
  html_text()
  print(accomodation)
}

#getting all the data
accomodation <- html_nodes(pg, "ul > li > a") %>% 
  html_text()

#separating the numbers
accomodation.num <- (str_extract(accomodation, "[0-9]+"))
#separating the characters
accomodation.char <- (str_extract(accomodation,"[aA-zZ]+"))
#separating unique characters
unique(accomodation.char)

Answer 1

import requests
from bs4 import BeautifulSoup
import pandas as pd

r = requests.get('https://www.booking.com/country.html')
soup = BeautifulSoup(r.text, 'html.parser')

data = []
for item in soup.findAll('div', attrs={'class': 'block_third block_third--flag-module'}):
    country = [(country.text).replace('\n', '')
               for country in item.findAll('a')]
    data.append(country)

final = []
for item in data:
    final.append(item)

df = pd.DataFrame(final)
df.to_csv('output.csv')

View Output Online: Click Here

Another Version for user requirements via CHAT:

import requests
from bs4 import BeautifulSoup
import pandas as pd

r = requests.get('https://www.booking.com/country.html')
soup = BeautifulSoup(r.text, 'html.parser')

data = []
for item in soup.select('div.block_third.block_third--flag-module'):
    country = [(country.text).replace('\n', '')
               for country in item.select('a')]
    data.append(country)

final = []
for item in data:
    final.append(item)

df = pd.DataFrame(final).set_index(0)
df.index.name = 'location'
split = df.stack().str.extract('^(?P<freq>[\d,]+)\s+(?P<category>.*)').reset_index(level=1, drop=True)
pvt = split.pivot(columns='category', values='freq')
pvt.sort_index(axis=1, inplace=True)
pvt.reset_index().to_csv('output2.csv', index=False)

How to scrape data from a website and write to a csv in a specified format in R?

Question

1 answers

solution1
1 ACCPTED 2019-11-30 17:01:34

How to scrape data from a website and write to a csv in a specified format in R?

Question

1 answers

solution1 1 ACCPTED 2019-11-30 17:01:34

solution1
1 ACCPTED 2019-11-30 17:01:34