[英]Replacing nested loop in R
我是R的新手並在論壇上搜索過這個但是無法得到足夠的解決方案。 我正在嘗試在IP地址和相應的地理位置之間進行映射。 我有2個數據集。
Set-a (1,60,000 rows):
ip(int) | ID(int)
Set-b (16,00,000 rows):
Ip1(int) | Ip2(int) | Code(str) | Country(str) | Area1(str) | Area2(str)
我正在嘗試執行以下操作: 如果ip位於Ip1和Ip2之間,則將Country&Region添加到Set-a。
我正在做以下事情(顯然不是一個非常好的方法):
ip1<-as.numeric(b$Ip1)
ip2<-as.numeric(b$Ip2)
country<-b$Country
area1<-b$Area1
area2<-b$Area2
for(i in 1:160000){
for(j in 1:1674303){
if(a[i]>ip1[j] & a[i]<ip2[j]) {
a$country[i]<-country[j]
a$area1[i]<-area1[j]
a$area2[i]<-area2[j]}
}
}
有人可以告訴我一個有效的方法來做到這一點。 這花費了很多時間。 (對於i = 1到100需要大約10分鍾才能運行)。
樣本數據集-b是:
Ip1, Ip2, Code, Country, Area1, Area2
"0","16777215","-","-","-","-"
"16777216","16777471","AU","AUSTRALIA","QUEENSLAND","SOUTH BRISBANE"
"16777472","16778239","CN","CHINA","FUJIAN","FUZHOU"
"16778240","16778495","AU","AUSTRALIA","VICTORIA","MELBOURNE"
"16778496","16778751","AU","AUSTRALIA","NEW SOUTH WALES","SYDNEY"
它是在不斷增加的順序。
dput(head(a))和dput(head(b))分別為:(參考上面的示例數據)
structure(IP_Addr = c("38825563", "38921619", "42470287", "42471923","42473368","42473428"),
Desc_value = c("0", "1.2", "4.97", "1", "5.9", "22.06")), .Names = c("IP_Addr", "Desc_value"), row.names = c(NA, 6L), class = "data.frame")
structure(list(Ip1 = c("0", "16777216", "16777472", "16778240",
"16778496", "16778752"), Ip2 = c("16777215", "16777471", "16778239",
"16778495", "16778751", "16779263"), Code = c("-", "AU", "CN",
"AU", "AU", "AU"), Country = c("-", "AUSTRALIA", "CHINA", "AUSTRALIA",
"AUSTRALIA", "AUSTRALIA"), Area1 = c("-", "QUEENSLAND", "FUJIAN",
"VICTORIA", "NEW SOUTH WALES", "-"), Area2 = c("-", "SOUTH BRISBANE",
"FUZHOU", "MELBOURNE", "SYDNEY", "-")), .Names = c("Ip1", "Ip2",
"Code", "Country", "Area1", "Area2"), row.names = c(NA, 6L), class = "data.frame")
這是一個data.table
解決方案:
# Let's take Blue Magister's example set:
set.seed(10)
a <- data.frame(ip=sample(16777216:16778751,10,replace=TRUE))
b <- read.table(sep=",",header=TRUE,text='Ip1, Ip2, Code, Country, Area1, Area2
"0","16777215","-","-","-","-"
"16777216","16777471","AU","AUSTRALIA","QUEENSLAND","SOUTH BRISBANE"
"16777472","16778239","CN","CHINA","FUJIAN","FUZHOU"
"16778240","16778495","AU","AUSTRALIA","VICTORIA","MELBOURNE"
"16778496","16778751","AU","AUSTRALIA","NEW SOUTH WALES","SYDNEY"')
b$Ip1 <-as.numeric(b$Ip1)
# include library, convert to data.table
library(data.table)
a = data.table(a)
b = data.table(b, key = "Ip1")
# and now the actual computation
a = b[a, roll = Inf][, Ip2 := NULL] # yep, amazingly, it's *that* simple in data.table
setnames(a, "Ip1", "ip") # you can also include, exclude whatever columns you want
a
# ip Code Country Area1 Area2
# 1: 16777995 CN CHINA FUJIAN FUZHOU
# 2: 16777687 CN CHINA FUJIAN FUZHOU
# 3: 16777871 CN CHINA FUJIAN FUZHOU
# 4: 16778280 AU AUSTRALIA VICTORIA MELBOURNE
# 5: 16777346 AU AUSTRALIA QUEENSLAND SOUTH BRISBANE
# 6: 16777562 CN CHINA FUJIAN FUZHOU
# 7: 16777637 CN CHINA FUJIAN FUZHOU
# 8: 16777634 CN CHINA FUJIAN FUZHOU
# 9: 16778161 CN CHINA FUJIAN FUZHOU
#10: 16777875 CN CHINA FUJIAN FUZHOU
曾Ip1
被數字,一個詳盡的清單ip
可以匹配,然后上面,簡直是(的合並Ip1
在b
與第一列a
,即ip
),但data.table
還提供了當是做什么的選項沒有完全匹配。 您可以告訴它例如向前滾動前一個觀察(這是我上面所做的),或者將其向后滾動或滾動到最近的觀察點 - 請參閱?data.table
以獲取更多信息。
你不能刪除第二個循環使用,
j = intersect(which(ip1 < x[i]), which(ip2 > x[i]))
if (length(j)==1){
a$country[i]<-country[j]
a$area1[i]<-area1[j]
a$area2[i]<-area2[j]
}else{
cat("Multiple matches found!\n")
}
我會嘗試findInterval
:
#create example
set.seed(10)
a <- data.frame(ip=sample(16777216:16778751,10,replace=TRUE))
b <- read.table(sep=",",header=TRUE,text='Ip1, Ip2, Code, Country, Area1, Area2
"0","16777215","-","-","-","-"
"16777216","16777471","AU","AUSTRALIA","QUEENSLAND","SOUTH BRISBANE"
"16777472","16778239","CN","CHINA","FUJIAN","FUZHOU"
"16778240","16778495","AU","AUSTRALIA","VICTORIA","MELBOURNE"
"16778496","16778751","AU","AUSTRALIA","NEW SOUTH WALES","SYDNEY"')
b$Ip1 <-as.numeric(b$Ip1)
indices <- findInterval(a$ip,b$Ip1,rightmost.closed=FALSE,all.inside=FALSE)
a <- data.frame(a,b[indices,c("Country","Area1","Area2")])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.