[英]dplyr manipulation rowwise grouping mutate
I have data sets 我有数据集
x <- data.frame(Postcode = c(1, 2, 3, 4, 5, 6),
Latitude = c(3.1, 3.2, 3.3, 3.3, 3.4, 3.4),
Longitude = c(100, 101, 102, 102, 103, 104),
Exposure = c(1, 2, 3, 4, 5, 6))
I am trying to manipulate the data inside x becomes 我试图操纵x内的数据成为
x <- data.frame(Postcode = c(1, 2, 3, 4, 5, 6),
Latitude = c(3.1, 3.2, 3.3, 3.3, 3.4, 3.4),
Longitude = c(100, 101, 102, 102, 103, 104),
Exposure = c(1, 2, 3, 4, 5, 6),
coords = c("3.1, 100", "3.2, 101", "3.3, 102", "3.3, 102",
"3.4, 103", "3.4, 104"),
postcode = c("1", "2", "3,4", "3,4", "5", "6"),
exposure = c(1, 2, 7, 7, 5, 6))
The new column postcode
will paste together the Postcode
that has the same Latitude
and Longitude
. 新列的postcode
会将具有相同Latitude
和Longitude
的Postcode
粘贴在一起。 coords
will paste the Latitude
and Longitude
, while exposure
will sum the Exposure
that has the same coords
, ie, the same Latitude
and Longitude
. coords
将粘贴Latitude
和Longitude
,而exposure
将coords
具有相同coords
(即相同的Latitude
和Longitude
的Exposure
。
I could accomplish this by using dplyr
package and for
loop 我可以通过使用dplyr
包和for
循环来完成此操作
x <- mutate(x, coords = paste(Latitude, Longitude, sep = ", "))
x <- cbind(x, postcode = rep(0, nrow(x)), exposure = rep(0, nrow(x)))
for(i in unique(x$coords)){
x$postcode[x$coords == i] <- paste(x$Postcode[x$coords == i], collapse = ", ")
x$exposure[x$coords == i] <- sum(x$Exposure[x$coords == i])
}
How could I accomplish this by only using dplyr
package only, without using for
loop? 如何仅通过仅使用dplyr
软件包而不使用for
循环来完成此操作? or maybe other approach which is more efficient than using for
loop because my actual data sets are quite large 也许还有其他方法比使用for
循环更有效for
因为我的实际数据集非常大
library(dplyr)
library(tidyr) # unite() was used to join Lat, Lon
x %>% unite(coords, Latitude, Longitude, sep = ",", remove = FALSE) %>%
group_by(coords) %>% mutate(exposure = sum(Postcode), postcode = toString(Postcode))
Here is how you can do it with dplyr
: 这是使用dplyr
:
library(dplyr)
x %>%
group_by(coords = paste(Latitude, Longitude, sep = ", ")) %>%
mutate(postcode = toString(Postcode), exposure = sum(Exposure))
# Source: local data frame [6 x 7]
# Groups: coords [5]
#
# Postcode Latitude Longitude Exposure coords postcode exposure
# <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl>
# 1 1 3.1 100 1 3.1, 100 1 1
# 2 2 3.2 101 2 3.2, 101 2 2
# 3 3 3.3 102 3 3.3, 102 3, 4 7
# 4 4 3.3 102 4 3.3, 102 3, 4 7
# 5 5 3.4 103 5 3.4, 103 5 5
# 6 6 3.4 104 6 3.4, 104 6 6
We can do this with data.table
我们可以使用data.table
来做到这data.table
library(data.table)
setDT(x)[, coords := paste(Latitude, Longitude, sep="," )
][, c("exposure", "postcode") :=.(sum(Postcode), toString(Postcode)), coords]
x
# Postcode Latitude Longitude Exposure coords exposure postcode
#1: 1 3.1 100 1 3.1,100 1 1
#2: 2 3.2 101 2 3.2,101 2 2
#3: 3 3.3 102 3 3.3,102 7 3, 4
#4: 4 3.3 102 4 3.3,102 7 3, 4
#5: 5 3.4 103 5 3.4,103 5 5
#6: 6 3.4 104 6 3.4,104 6 6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.