I have data sets
x <- data.frame(Postcode = c(1, 2, 3, 4, 5, 6),
Latitude = c(3.1, 3.2, 3.3, 3.3, 3.4, 3.4),
Longitude = c(100, 101, 102, 102, 103, 104),
Exposure = c(1, 2, 3, 4, 5, 6))
I am trying to manipulate the data inside x becomes
x <- data.frame(Postcode = c(1, 2, 3, 4, 5, 6),
Latitude = c(3.1, 3.2, 3.3, 3.3, 3.4, 3.4),
Longitude = c(100, 101, 102, 102, 103, 104),
Exposure = c(1, 2, 3, 4, 5, 6),
coords = c("3.1, 100", "3.2, 101", "3.3, 102", "3.3, 102",
"3.4, 103", "3.4, 104"),
postcode = c("1", "2", "3,4", "3,4", "5", "6"),
exposure = c(1, 2, 7, 7, 5, 6))
The new column postcode
will paste together the Postcode
that has the same Latitude
and Longitude
. coords
will paste the Latitude
and Longitude
, while exposure
will sum the Exposure
that has the same coords
, ie, the same Latitude
and Longitude
.
I could accomplish this by using dplyr
package and for
loop
x <- mutate(x, coords = paste(Latitude, Longitude, sep = ", "))
x <- cbind(x, postcode = rep(0, nrow(x)), exposure = rep(0, nrow(x)))
for(i in unique(x$coords)){
x$postcode[x$coords == i] <- paste(x$Postcode[x$coords == i], collapse = ", ")
x$exposure[x$coords == i] <- sum(x$Exposure[x$coords == i])
}
How could I accomplish this by only using dplyr
package only, without using for
loop? or maybe other approach which is more efficient than using for
loop because my actual data sets are quite large
library(dplyr)
library(tidyr) # unite() was used to join Lat, Lon
x %>% unite(coords, Latitude, Longitude, sep = ",", remove = FALSE) %>%
group_by(coords) %>% mutate(exposure = sum(Postcode), postcode = toString(Postcode))
Here is how you can do it with dplyr
:
library(dplyr)
x %>%
group_by(coords = paste(Latitude, Longitude, sep = ", ")) %>%
mutate(postcode = toString(Postcode), exposure = sum(Exposure))
# Source: local data frame [6 x 7]
# Groups: coords [5]
#
# Postcode Latitude Longitude Exposure coords postcode exposure
# <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl>
# 1 1 3.1 100 1 3.1, 100 1 1
# 2 2 3.2 101 2 3.2, 101 2 2
# 3 3 3.3 102 3 3.3, 102 3, 4 7
# 4 4 3.3 102 4 3.3, 102 3, 4 7
# 5 5 3.4 103 5 3.4, 103 5 5
# 6 6 3.4 104 6 3.4, 104 6 6
We can do this with data.table
library(data.table)
setDT(x)[, coords := paste(Latitude, Longitude, sep="," )
][, c("exposure", "postcode") :=.(sum(Postcode), toString(Postcode)), coords]
x
# Postcode Latitude Longitude Exposure coords exposure postcode
#1: 1 3.1 100 1 3.1,100 1 1
#2: 2 3.2 101 2 3.2,101 2 2
#3: 3 3.3 102 3 3.3,102 7 3, 4
#4: 4 3.3 102 4 3.3,102 7 3, 4
#5: 5 3.4 103 5 3.4,103 5 5
#6: 6 3.4 104 6 3.4,104 6 6
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.