简体   繁体   中英

Search And Scrape data from a website using R

I have 1000 records with emailaddress and all other address information. For which I want information for each record from this website [ https://www.melissadata.com/lookups/businesscoder.asp][1] . Is there any way to automate this process.

Here is a working three liner example on how to extract every link from a website:

# r library for making requests
library(httr)
# r library for parsing XML and HTML
library(XML)

# performing GET request to website
response <- GET("https://www.melissadata.com/lookups/index.htm", encoding="UTF-8")
# parse response as html in order to run xpath queries
parsedoc <- htmlParse(response)
# perform xpath search query on parsed document
links <- xpathSApply(parsedoc, "//a", xmlGetAttr, "href")

To web scrape you should get known with xpath queries: https://www.w3schools.com/xml/xpath_intro.asp

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM