I have made this short code to automate geocoding of IP addresses by using the freegeoip.net (15,000 queries per hour by default; excellent service!):
> library(RCurl)
Loading required package: bitops
> ip.lst =
c("193.198.38.10","91.93.52.105","134.76.194.180","46.183.103.8")
> q = do.call(rbind, lapply(ip.lst, function(x){
try( data.frame(t(strsplit(getURI(paste0("freegeoip.net/csv/", x)), ",")[[1]]), stringsAsFactors = FALSE) )
}))
> names(q) = c("ip","country_code","country_name","region_code","region_name","city","zip_code","time_zone","latitude","longitude","metro_code")
> str(q)
'data.frame': 4 obs. of 11 variables:
$ ip : chr "193.198.38.10" "91.93.52.105" "134.76.194.180" "46.183.103.8"
$ country_code: chr "HR" "TR" "DE" "DE"
$ country_name: chr "Croatia" "Turkey" "Germany" "Germany"
$ region_code : chr "" "06" "NI" ""
$ region_name : chr "" "Ankara" "Lower Saxony" ""
$ city : chr "" "Ankara" "Gottingen" ""
$ zip_code : chr "" "06450" "37079" ""
$ time_zone : chr "Europe/Zagreb" "Europe/Istanbul" "Europe/Berlin" ""
$ latitude : chr "45.1667" "39.9230" "51.5333" "51.2993"
$ longitude : chr "15.5000" "32.8378" "9.9333" "9.4910"
$ metro_code : chr "0\r\n" "0\r\n" "0\r\n" "0\r\n"
In three lines of code you get coordinates for all IPs including city/country codes. I wonder if this could be parallelized so it runs even faster? To geocode >10,000 IPs can take hours otherwise.
library(rgeolocate)
ip_lst = c("193.198.38.10", "91.93.52.105", "134.76.194.180", "46.183.103.8")
maxmind(ip_lst, "~/Data/GeoLite2-City.mmdb",
fields=c("country_code", "country_name", "region_name", "city_name",
"timezone", "latitude", "longitude"))
## country_code country_name region_name city_name timezone latitude longitude
## 1 HR Croatia <NA> <NA> Europe/Zagreb 45.1667 15.5000
## 2 TR Turkey Istanbul Istanbul Europe/Istanbul 41.0186 28.9647
## 3 DE Germany Lower Saxony Bilshausen Europe/Berlin 51.6167 10.1667
## 4 DE Germany North Rhine-Westphalia Aachen Europe/Berlin 50.7787 6.1085
There are instructions in the package for obtaining the necessary data files. Some of the fields you're pulling are woefully inaccurate (more so than any geoip vendor would like to admit). If you do need ones that aren't available, file an issue and we'll add them.
I've found multidplyr is a great package for making parallel server calls. This is the best guide I've found, and I highly recommend reading the whole thing to better understand how the package works: http://www.business-science.io/code-tools/2016/12/18/multidplyr.html
library("devtools")
devtools::install_github("hadley/multidplyr")
library(parallel)
library(multidplyr)
library(RCurl)
library(tidyverse)
# Convert your example into a function
get_ip <- function(ip) {
do.call(rbind, lapply(ip, function(x) {
try(data.frame(t(strsplit(getURI(
paste0("freegeoip.net/csv/", x)
), ",")[[1]]), stringsAsFactors = FALSE))
})) %>% nest(X1:X11)
}
# Made ip.lst into a Tibble to make it work better with dplyr
ip.lst =
tibble(
ip = c(
"193.198.38.10",
"91.93.52.105",
"134.76.194.180",
"46.183.103.8",
"193.198.38.10",
"91.93.52.105",
"134.76.194.180",
"46.183.103.8"
)
)
# Create a cluster based on how many cores your machine has
cl <- detectCores()
cluster <- create_cluster(cores = cl)
# Create a partitioned tibble
by_group <- partition(ip.lst, cluster = cluster)
# Send libraries and the function get_ip() to each cluster
by_group %>%
cluster_library("tidyverse") %>%
cluster_library("RCurl") %>%
cluster_assign_value("get_ip", get_ip)
# Send parallel requests to the website and parse the results
q <- by_group %>%
do(get_ip(.$ip)) %>%
collect() %>%
unnest() %>%
tbl_df() %>%
select(-PARTITION_ID)
# Set names of the results
names(q) = c(
"ip",
"country_code",
"country_name",
"region_code",
"region_name",
"city",
"zip_code",
"time_zone",
"latitude",
"longitude",
"metro_code"
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.