[英]How to aggregate rows that contain NA values in R
I would like to go from this:我想从这个 go :
City State x1 x2 x3
NA CA 10 10 10
SD CA 10 10 10
NA CA 10 10 10
SF CA 10 10 10
FW TX 5 5 5
NA TX 5 5 5
NA TX 5 5 5
To This:对此:
State sum
CA 120
TX 45
col1 <- c(NA,'SD',NA,'SF','FW', NA, NA)
col2 <- c('CA', 'CA', 'CA', 'CA', 'TX', 'TX', 'TX')
col3 <- c(10,10,10,10,5,5,5)
col4 <- c(10,10,10,10,5,5,5)
col5 <- c(10,10,10,10,5,5,5)
df <- data.frame(City=col1, State=col2, x1=col3, x2=col4,x3=col5)
col6 <- c('CA', 'TX')
col7 <- c(120, 45)
solution <- data.frame(State=col6, sum=col7)
edit: fixed error in data frame.编辑:修复了数据框中的错误。 and change 'NA' to NA.
并将“NA”更改为 NA。 Thank you to Ronak for replying so quickly.
感谢 Ronak 如此迅速地回复。
@Ronak Shah solution is way better, but here is another longer but still effective solution to get to know some useful functions for future's sake: @Ronak Shah 解决方案要好得多,但这是另一个更长但仍然有效的解决方案,可以为将来了解一些有用的功能:
library(dplyr)
df %>%
group_by(State) %>%
summarise(across(x1:x3, ~ sum(.x, na.rm = TRUE))) %>% # We use across() for column-wise operations
rowwise() %>%
mutate(sum = sum(c_across(x1:x3), na.rm = TRUE)) %>% # We use rowwise() function + c_across() for row wise operations
select(-c(x1:x3))
# A tibble: 2 x 2
# Rowwise:
State sum
<chr> <int>
1 CA 120
2 TX 45
This is also very useful and closer to the one mentioned above:这也非常有用,并且更接近于上面提到的那个:
df %>%
group_by(State) %>%
summarise(sum = sum(c_across(x1:x3), na.rm = TRUE))
# A tibble: 2 x 2
State sum
<chr> <int>
1 CA 120
2 TX 45
You can subset the columns to sum from cur_data()
in dplyr
.您可以对 dplyr 中的
dplyr
cur_data()
中的列进行子集化。
library(dplyr)
df %>%
group_by(State) %>%
summarise(sum = sum(select(cur_data(), x1:x3), na.rm = TRUE))
# State sum
# <chr> <int>
#1 CA 120
#2 TX 45
data数据
df <- structure(list(City = c(NA, "SD", NA, "SF", "FW", NA, NA), State = c("CA",
"CA", "CA", "CA", "TX", "TX", "TX"), x1 = c(10L, 10L, 10L, 10L,
5L, 5L, 5L), x2 = c(10L, 10L, 10L, 10L, 5L, 5L, 5L), x3 = c(10L,
10L, 10L, 10L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA, -7L))
We can use data.table
methods for efficiency.我们可以使用
data.table
方法来提高效率。 Convert the data.frame to 'data.table ( setDT(df)
), grouped by 'State, specify the columns as a pattern
of column names in .SDcols
, get the rowSums
of the Subset of Data.table ( .SD
) and sum
it将 data.frame 转换为 'data.table (
setDT(df)
),按 'State 分组,将列指定为.SDcols
中的列名pattern
,获取rowSums
( .SD
) 的子集的 rowSums 并sum
library(data.table)
setDT(df)[ , sum(rowSums(.SD), na.rm = TRUE), State,
.SDcols = patterns('^x\\d+$')]
# State V1
#1: CA 120
#2: TX 45
df <- structure(list(City = c(NA, "SD", NA, "SF", "FW", NA, NA), State = c("CA",
"CA", "CA", "CA", "TX", "TX", "TX"), x1 = c(10L, 10L, 10L, 10L,
5L, 5L, 5L), x2 = c(10L, 10L, 10L, 10L, 5L, 5L, 5L), x3 = c(10L,
10L, 10L, 10L, 5L, 5L, 5L)), class = "data.frame",
row.names = c(NA, -7L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.