簡體   English   中英

dplyr操作行分組突變

[英]dplyr manipulation rowwise grouping mutate

我有數據集

x <- data.frame(Postcode = c(1, 2, 3, 4, 5, 6), 
                Latitude = c(3.1, 3.2, 3.3, 3.3, 3.4, 3.4),
                Longitude = c(100, 101, 102, 102, 103, 104),
                Exposure = c(1, 2, 3, 4, 5, 6))

我試圖操縱x內的數據成為

x <- data.frame(Postcode = c(1, 2, 3, 4, 5, 6), 
                Latitude = c(3.1, 3.2, 3.3, 3.3, 3.4, 3.4),
                Longitude = c(100, 101, 102, 102, 103, 104),
                Exposure = c(1, 2, 3, 4, 5, 6),
                coords = c("3.1, 100", "3.2, 101", "3.3, 102", "3.3, 102",
                           "3.4, 103", "3.4, 104"),
                postcode = c("1", "2", "3,4", "3,4", "5", "6"),
                exposure = c(1, 2, 7, 7, 5, 6))

新列的postcode會將具有相同LatitudeLongitudePostcode粘貼在一起。 coords將粘貼LatitudeLongitude ,而exposurecoords具有相同coords (即相同的LatitudeLongitudeExposure

我可以通過使用dplyr包和for循環來完成此操作

x <- mutate(x, coords = paste(Latitude, Longitude, sep = ", "))
x <- cbind(x, postcode = rep(0, nrow(x)), exposure = rep(0, nrow(x)))
for(i in unique(x$coords)){
  x$postcode[x$coords == i] <- paste(x$Postcode[x$coords == i], collapse = ", ")
  x$exposure[x$coords == i] <- sum(x$Exposure[x$coords == i])
}

如何僅通過僅使用dplyr軟件包而不使用for循環來完成此操作? 也許還有其他方法比使用for循環更有效for因為我的實際數據集非常大

library(dplyr)
library(tidyr)  # unite() was used to join Lat, Lon

x %>% unite(coords, Latitude, Longitude, sep = ",", remove = FALSE) %>% 
  group_by(coords) %>% mutate(exposure = sum(Postcode), postcode = toString(Postcode))

這是使用dplyr

library(dplyr)
x %>% 
     group_by(coords = paste(Latitude, Longitude, sep = ", ")) %>% 
     mutate(postcode = toString(Postcode), exposure = sum(Exposure))

# Source: local data frame [6 x 7]
# Groups: coords [5]
# 
#   Postcode Latitude Longitude Exposure   coords postcode exposure
#      <dbl>    <dbl>     <dbl>    <dbl>    <chr>    <chr>    <dbl>
# 1        1      3.1       100        1 3.1, 100        1        1
# 2        2      3.2       101        2 3.2, 101        2        2
# 3        3      3.3       102        3 3.3, 102     3, 4        7
# 4        4      3.3       102        4 3.3, 102     3, 4        7
# 5        5      3.4       103        5 3.4, 103        5        5
# 6        6      3.4       104        6 3.4, 104        6        6

我們可以使用data.table來做到這data.table

library(data.table)
setDT(x)[, coords := paste(Latitude, Longitude, sep="," )
  ][, c("exposure", "postcode") :=.(sum(Postcode), toString(Postcode)), coords]
x
#   Postcode Latitude Longitude Exposure  coords exposure postcode
#1:        1      3.1       100        1 3.1,100        1        1
#2:        2      3.2       101        2 3.2,101        2        2
#3:        3      3.3       102        3 3.3,102        7     3, 4
#4:        4      3.3       102        4 3.3,102        7     3, 4
#5:        5      3.4       103        5 3.4,103        5        5
#6:        6      3.4       104        6 3.4,104        6        6

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM