简体   繁体   中英

Is it possible to summarize data by columns in a data frame in R?

I would like to know if there is a way to "compact" data frames in R.

I have a change tracker from a document. Currently, my fata frame looks like this:

在此处输入图像描述

Here we can see that there is a change per row, which is then separated in different section and if information is added and removed.

I would like to know if there is any function in R that allows users to "compact" the information in the data frames like so:

在此处输入图像描述

Here the information is summarised in each column for each sectio, making the table a little bit more human readable.

Is this possible?

Thank you!

Here you can dinf some data to reproduce this case:

DOCUMENT<- c("DOC1","DOC1","DOC1","DOC1","DOC1","DOC1")
DATE<- c("day","day","day","day","day","day")
SectionA.added<- c("Change 1", "Change2", "change3", NA, NA, NA)
SectionA.deleted<- c(NA, NA, NA, "Change 4", NA,NA)
SectionB.added<- c(NA, NA, NA, NA, "Change5", NA)
SectionB.deleted<- c(NA, NA, NA, NA, NA, NA)
OTHERS<- c(NA, NA, NA, NA, NA, "Change 6")

changes_df <- data.frame(DOCUMENT,DATE, SectionA.added, SectionA.deleted, SectionB.added, SectionB.deleted, OTHERS )

You can use lead to 'move up' values in the columns:

library(dplyr)
df %>%
  mutate(a = lead(a,1),
         b = lead(b,3),
         c = lead(c,2))
  ID  a  b  c
1  1  1  1  2
2  2  2 NA NA
3  3  3 NA NA
4  4 NA NA NA

Data:

df <- data.frame(
  ID = 1:4,
  a = c(NA, 1,2,3),
  b = c(NA, NA, NA, 1),
  c = c(NA, NA, 2, NA)
)

EDIT :

This is a more general solution which works iff the last value in each column is always an integer:

df %>%
  mutate(across(a:c, ~lead(., sum(is.na(.)))))

Data (adapted):

df <- data.frame(
  ID = 1:4,
  a = c(NA, 1,2,3),
  b = c(NA, NA, NA, 1),
  c = c(NA, NA, 2, 1)
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM