繁体   English   中英

dplyr操作行分组突变

[英]dplyr manipulation rowwise grouping mutate

我有数据集

x <- data.frame(Postcode = c(1, 2, 3, 4, 5, 6), 
                Latitude = c(3.1, 3.2, 3.3, 3.3, 3.4, 3.4),
                Longitude = c(100, 101, 102, 102, 103, 104),
                Exposure = c(1, 2, 3, 4, 5, 6))

我试图操纵x内的数据成为

x <- data.frame(Postcode = c(1, 2, 3, 4, 5, 6), 
                Latitude = c(3.1, 3.2, 3.3, 3.3, 3.4, 3.4),
                Longitude = c(100, 101, 102, 102, 103, 104),
                Exposure = c(1, 2, 3, 4, 5, 6),
                coords = c("3.1, 100", "3.2, 101", "3.3, 102", "3.3, 102",
                           "3.4, 103", "3.4, 104"),
                postcode = c("1", "2", "3,4", "3,4", "5", "6"),
                exposure = c(1, 2, 7, 7, 5, 6))

新列的postcode会将具有相同LatitudeLongitudePostcode粘贴在一起。 coords将粘贴LatitudeLongitude ,而exposurecoords具有相同coords (即相同的LatitudeLongitudeExposure

我可以通过使用dplyr包和for循环来完成此操作

x <- mutate(x, coords = paste(Latitude, Longitude, sep = ", "))
x <- cbind(x, postcode = rep(0, nrow(x)), exposure = rep(0, nrow(x)))
for(i in unique(x$coords)){
  x$postcode[x$coords == i] <- paste(x$Postcode[x$coords == i], collapse = ", ")
  x$exposure[x$coords == i] <- sum(x$Exposure[x$coords == i])
}

如何仅通过仅使用dplyr软件包而不使用for循环来完成此操作? 也许还有其他方法比使用for循环更有效for因为我的实际数据集非常大

library(dplyr)
library(tidyr)  # unite() was used to join Lat, Lon

x %>% unite(coords, Latitude, Longitude, sep = ",", remove = FALSE) %>% 
  group_by(coords) %>% mutate(exposure = sum(Postcode), postcode = toString(Postcode))

这是使用dplyr

library(dplyr)
x %>% 
     group_by(coords = paste(Latitude, Longitude, sep = ", ")) %>% 
     mutate(postcode = toString(Postcode), exposure = sum(Exposure))

# Source: local data frame [6 x 7]
# Groups: coords [5]
# 
#   Postcode Latitude Longitude Exposure   coords postcode exposure
#      <dbl>    <dbl>     <dbl>    <dbl>    <chr>    <chr>    <dbl>
# 1        1      3.1       100        1 3.1, 100        1        1
# 2        2      3.2       101        2 3.2, 101        2        2
# 3        3      3.3       102        3 3.3, 102     3, 4        7
# 4        4      3.3       102        4 3.3, 102     3, 4        7
# 5        5      3.4       103        5 3.4, 103        5        5
# 6        6      3.4       104        6 3.4, 104        6        6

我们可以使用data.table来做到这data.table

library(data.table)
setDT(x)[, coords := paste(Latitude, Longitude, sep="," )
  ][, c("exposure", "postcode") :=.(sum(Postcode), toString(Postcode)), coords]
x
#   Postcode Latitude Longitude Exposure  coords exposure postcode
#1:        1      3.1       100        1 3.1,100        1        1
#2:        2      3.2       101        2 3.2,101        2        2
#3:        3      3.3       102        3 3.3,102        7     3, 4
#4:        4      3.3       102        4 3.3,102        7     3, 4
#5:        5      3.4       103        5 3.4,103        5        5
#6:        6      3.4       104        6 3.4,104        6        6

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM