简体   繁体   English

转置数据帧变量并在[r]中添加空的唯一计数

[英]Transpose data frame variables and add null, unique counts in [r]

I am trying to build a summary table of a data frame like DataProfile below. 我正在尝试建立下面的DataProfile之类的数据框的摘要表。 The idea is to transform each column into a row and add variables for count, nulls, not nulls, unique, and add additional mutations of those variables. 想法是将每一列转换为一行,并添加用于计数,空值(而非空值),唯一性的变量,并添加这些变量的其他突变。

It seems like there should be a better faster way to do this. 似乎应该有一个更好的更快方法。 Is there a function that does this? 是否有执行此功能的功能?

#trying to write the functions within dplyr & magrittr framework
library(tidyverse)

mtcars[2,2] <- NA # Add a null to test completeness

# 
total <- mtcars %>% summarise_all(funs(n())) %>% melt
nulls <- mtcars %>% summarise_all(funs(sum(is.na(.)))) %>% melt
filled <- mtcars  %>% summarise_all(funs(sum(!is.na(.)))) %>% melt
uniques <- mtcars %>% summarise_all(funs(length(unique(.)))) %>% melt


mtcars %>% summarise_all(funs(n_distinct(.))) %>% melt


#Build a Data Frame from names of mtcars and add variables with mutate
DataProfile <- as.data.frame(names(mtcars))
DataProfile <- DataProfile %>% mutate(Total = total$value,
                       Nulls = nulls$value,
                       Filled = filled $value,
                       Complete = Filled/Total,
                       Cardinality = uniques$value,
                       Uniqueness = Cardinality/Total,
                       Distinctness = Cardinality/Filled)
DataProfile

#These are other attempts with Base R, but they are harder to read and don't play well with summarise_all
sapply(mtcars, function(x) length(unique(x[!is.na(x)]))) %>% melt
rapply(mtcars,function(x)length(unique(x))) %>% melt

The summarise_all() function can process more than one function at a time, so you can consolidate code by doing it in one pass then formatting your data to get to the type of "profile" per variable that you want. summarise_all()函数一次可以处理多个函数,因此您可以通过一次编码来合并代码,然后格式化数据以达到所需的每个变量“配置文件”类型。

library(tidyverse)

mtcars[2,2] <- NA # Add a null to test completeness

DataProfile <- mtcars %>% 
  summarise_all(funs("Total" = n(), 
                     "Nulls" = sum(is.na(.)), 
                     "Filled" = sum(!is.na(.)), 
                     "Cardinality" = length(unique(.)))) %>% 
  melt() %>%
  separate(variable, into = c('variable', 'measure'), sep="_") %>%
  spread(measure, value)  %>%
  mutate(Complete = Filled/Total,
         Uniqueness = Cardinality/Total,
         Distinctness = Cardinality/Filled)

DataProfile

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM