简体   繁体   中英

dplyr manipulation rowwise grouping mutate

I have data sets

x <- data.frame(Postcode = c(1, 2, 3, 4, 5, 6), 
                Latitude = c(3.1, 3.2, 3.3, 3.3, 3.4, 3.4),
                Longitude = c(100, 101, 102, 102, 103, 104),
                Exposure = c(1, 2, 3, 4, 5, 6))

I am trying to manipulate the data inside x becomes

x <- data.frame(Postcode = c(1, 2, 3, 4, 5, 6), 
                Latitude = c(3.1, 3.2, 3.3, 3.3, 3.4, 3.4),
                Longitude = c(100, 101, 102, 102, 103, 104),
                Exposure = c(1, 2, 3, 4, 5, 6),
                coords = c("3.1, 100", "3.2, 101", "3.3, 102", "3.3, 102",
                           "3.4, 103", "3.4, 104"),
                postcode = c("1", "2", "3,4", "3,4", "5", "6"),
                exposure = c(1, 2, 7, 7, 5, 6))

The new column postcode will paste together the Postcode that has the same Latitude and Longitude . coords will paste the Latitude and Longitude , while exposure will sum the Exposure that has the same coords , ie, the same Latitude and Longitude .

I could accomplish this by using dplyr package and for loop

x <- mutate(x, coords = paste(Latitude, Longitude, sep = ", "))
x <- cbind(x, postcode = rep(0, nrow(x)), exposure = rep(0, nrow(x)))
for(i in unique(x$coords)){
  x$postcode[x$coords == i] <- paste(x$Postcode[x$coords == i], collapse = ", ")
  x$exposure[x$coords == i] <- sum(x$Exposure[x$coords == i])
}

How could I accomplish this by only using dplyr package only, without using for loop? or maybe other approach which is more efficient than using for loop because my actual data sets are quite large

library(dplyr)
library(tidyr)  # unite() was used to join Lat, Lon

x %>% unite(coords, Latitude, Longitude, sep = ",", remove = FALSE) %>% 
  group_by(coords) %>% mutate(exposure = sum(Postcode), postcode = toString(Postcode))

Here is how you can do it with dplyr :

library(dplyr)
x %>% 
     group_by(coords = paste(Latitude, Longitude, sep = ", ")) %>% 
     mutate(postcode = toString(Postcode), exposure = sum(Exposure))

# Source: local data frame [6 x 7]
# Groups: coords [5]
# 
#   Postcode Latitude Longitude Exposure   coords postcode exposure
#      <dbl>    <dbl>     <dbl>    <dbl>    <chr>    <chr>    <dbl>
# 1        1      3.1       100        1 3.1, 100        1        1
# 2        2      3.2       101        2 3.2, 101        2        2
# 3        3      3.3       102        3 3.3, 102     3, 4        7
# 4        4      3.3       102        4 3.3, 102     3, 4        7
# 5        5      3.4       103        5 3.4, 103        5        5
# 6        6      3.4       104        6 3.4, 104        6        6

We can do this with data.table

library(data.table)
setDT(x)[, coords := paste(Latitude, Longitude, sep="," )
  ][, c("exposure", "postcode") :=.(sum(Postcode), toString(Postcode)), coords]
x
#   Postcode Latitude Longitude Exposure  coords exposure postcode
#1:        1      3.1       100        1 3.1,100        1        1
#2:        2      3.2       101        2 3.2,101        2        2
#3:        3      3.3       102        3 3.3,102        7     3, 4
#4:        4      3.3       102        4 3.3,102        7     3, 4
#5:        5      3.4       103        5 3.4,103        5        5
#6:        6      3.4       104        6 3.4,104        6        6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM