简体   繁体   English

在创建多个Benford图时正确调用变量名称

[英]Properly calling variable name when creating multiple Benford plots

I am creating Benford plots for all the numeric variables in my dataset. 我正在为我的数据集中的所有数字变量创建Benford图。 https://en.wikipedia.org/wiki/Benford%27s_law https://en.wikipedia.org/wiki/Benford%27s_law

Running a single variable 运行单个变量

#install.packages("benford.analysis")
library(benford.analysis)
plot(benford(iris$Sepal.Length))

Looks great. 看起来很棒。 And the legend says "Dataset: iris$Sepal.Length", perfect!. 传说中的“数据集:虹膜$ Sepal.Length”,完美无缺!

本福德1

Using apply to run 4 variables, 使用apply运行4个变量,

apply(iris[1:4], 2, function(x) plot(benford(x)))

Creates four plots, however, each plot's legend says "Dataset: x" 创建四个图,但是,每个图的图例都说“Dataset:x”

本福德2

I attempted to use a for loop, 我试图使用for循环,

for (i in colnames(iris[1:4])){
  plot(benford(iris[[i]]))
}

This creates four plots, but now the legends says "Dataset: iris[[i]]". 这创建了四个图,但现在传说说“数据集:虹膜[[i]]”。 And I would like the name of the variable on each chart. 我希望每张图表上的变量名称。

本福德3

I tried a different loop, hoping to get titles with an evaluated parsed string like "iris$Sepal.Length": 我尝试了一个不同的循环,希望获得带有评估解析字符串的标题,如“iris $ Sepal.Length”:

for (i in colnames(iris[1:4])){
  plot(benford(eval(parse(text=paste0("iris$", i)))))
}

But now the legend says "Dataset: eval(parse(text=paste0("iris$", i)))". 但现在传说中的“数据集:eval(解析(text = paste0(”iris $“,i)))”。

本福德4

AND , Now I've run into the infamous eval(parse(text=paste0( (eg: How to "eval" results returned by "paste0"? and R: eval(parse(...)) is often suboptimal ) AND ,现在我遇到了臭名昭着的eval(parse(text=paste0( (例如: 如何通过“paste0”返回“eval”结果?并且R:eval(parse(...))通常不是最理想的

I would like labels such as "Dataset: iris$Sepal.Length" or "Dataset: Sepal.Length". 我想要标签,如“Dataset:iris $ Sepal.Length”或“Dataset:Sepal.Length”。 How can I create multiple plots with meaningfully variable names in the legend? 如何在图例中创建具有有意义变量名称的多个图?

This is happening because of the first line within the benford function=: 这是因为benford函数中的第一行=:

benford <- function(data, number.of.digits = 2, sign = "positive", discrete=TRUE, round=3){

  data.name <- as.character(deparse(substitute(data)))

Source: https://github.com/cran/benford.analysis/blob/master/R/functions-new.R 资料来源: https//github.com/cran/benford.analysis/blob/master/R/functions-new.R

data.name is then used to name your graph. 然后使用data.name命名您的图形。 Whatever variable name or expression you pass to the function will unfortunately be caught by the deparse(substitute()) call, and will be used as the name for your graph. 遗憾的是,传递给函数的变量名称或表达式将被deparse(substitute())调用捕获,并将用作图形的名称。


One short-term solution is to copy and rewrite the function: 一个短期解决方案是复制和重写功能:

#install.packages("benford.analysis")
library(benford.analysis)
#install.packages("data.table")
library(data.table) # needed for function

# load hidden functions into namespace - needed for function
r <- unclass(lsf.str(envir = asNamespace("benford.analysis"), all = T))
for(name in r) eval(parse(text=paste0(name, '<-benford.analysis:::', name)))


benford_rev <- function{} # see below

for (i in colnames(iris[1:4])){
   plot(benford_rev(iris[[i]], data.name = i))
}

在此输入图像描述

在此输入图像描述

This has negative side effects of: 这有以下负面影响:

  • Not being maintainable with package revisions 包修订无法维护
  • Fills your GlobalEnv with normally hidden functions in the package 使用包中的常用隐藏功能填充GlobalEnv

So hopefully someone can propose a better way! 所以希望有人可以提出更好的方法!


benford_rev <- function(data, number.of.digits = 2, sign = "positive", discrete=TRUE, round=3, data.name = as.character(deparse(substitute(data)))){ # changed

 # removed line

  benford.digits <- generate.benford.digits(number.of.digits)

  benford.dist <- generate.benford.distribution(benford.digits)

  empirical.distribution <- generate.empirical.distribution(data, number.of.digits,sign, second.order = FALSE, benford.digits)

  n <- length(empirical.distribution$data)

  second.order <- generate.empirical.distribution(data, number.of.digits,sign, second.order = TRUE, benford.digits, discrete = discrete, round = round)

  n.second.order <- length(second.order$data)

  benford.dist.freq <- benford.dist*n

  ## calculating useful summaries and differences
  difference <- empirical.distribution$dist.freq - benford.dist.freq

  squared.diff <- ((empirical.distribution$dist.freq - benford.dist.freq)^2)/benford.dist.freq

  absolute.diff <- abs(empirical.distribution$dist.freq - benford.dist.freq)

  ### chi-squared test
  chisq.bfd <- chisq.test.bfd(squared.diff, data.name)

  ### MAD
  mean.abs.dev <- sum(abs(empirical.distribution$dist - benford.dist)/(length(benford.dist)))

  if (number.of.digits > 3) {
    MAD.conformity <- NA
  } else {
    digits.used <- c("First Digit", "First-Two Digits", "First-Three Digits")[number.of.digits]  
    MAD.conformity <- MAD.conformity(MAD = mean.abs.dev, digits.used)$conformity
  }





  ### Summation
  summation <- generate.summation(benford.digits,empirical.distribution$data, empirical.distribution$data.digits)
  abs.excess.summation <- abs(summation - mean(summation))

  ### Mantissa
  mantissa <- extract.mantissa(empirical.distribution$data)
  mean.mantissa <- mean(mantissa)
  var.mantissa <- var(mantissa)
  ek.mantissa <- excess.kurtosis(mantissa)
  sk.mantissa <- skewness(mantissa)

  ### Mantissa Arc Test
  mat.bfd <- mantissa.arc.test(mantissa, data.name)

  ### Distortion Factor
  distortion.factor <- DF(empirical.distribution$data)  

  ## recovering the lines of the numbers
  if (sign == "positive") lines <- which(data > 0 & !is.na(data))
  if (sign == "negative") lines <- which(data < 0 & !is.na(data))
  if (sign == "both")     lines <- which(data != 0 & !is.na(data))
  #lines <- which(data %in% empirical.distribution$data)

  ## output
  output <- list(info = list(data.name = data.name,
                             n = n,
                             n.second.order = n.second.order,
                             number.of.digits = number.of.digits),

                 data = data.table(lines.used = lines,
                                   data.used = empirical.distribution$data,
                                   data.mantissa = mantissa,
                                   data.digits = empirical.distribution$data.digits),

                 s.o.data = data.table(second.order = second.order$data,
                                       data.second.order.digits = second.order$data.digits),

                 bfd = data.table(digits = benford.digits,
                                  data.dist = empirical.distribution$dist,
                                  data.second.order.dist = second.order$dist,
                                  benford.dist = benford.dist,
                                  data.second.order.dist.freq = second.order$dist.freq,
                                  data.dist.freq = empirical.distribution$dist.freq,
                                  benford.dist.freq = benford.dist.freq,
                                  benford.so.dist.freq = benford.dist*n.second.order,
                                  data.summation = summation,
                                  abs.excess.summation = abs.excess.summation,
                                  difference = difference,
                                  squared.diff = squared.diff,
                                  absolute.diff = absolute.diff),

                 mantissa = data.table(statistic = c("Mean Mantissa", 
                                                     "Var Mantissa", 
                                                     "Ex. Kurtosis Mantissa",
                                                     "Skewness Mantissa"),
                                       values = c(mean.mantissa = mean.mantissa,
                                                  var.mantissa = var.mantissa,
                                                  ek.mantissa = ek.mantissa,
                                                  sk.mantissa = sk.mantissa)),
                 MAD = mean.abs.dev,

                 MAD.conformity = MAD.conformity,

                 distortion.factor = distortion.factor,

                 stats = list(chisq = chisq.bfd,
                              mantissa.arc.test = mat.bfd)
  )

  class(output) <- "Benford"

  return(output)

}

I have just updated the package (GitHub version) to allow for a user supplied name. 我刚刚更新了包(GitHub版本)以允许用户提供的名称。

Now the function has a new parameter called data.name in which you can provide a character vector with the name of the data and override the default. 现在该函数有一个名为data.name的新参数,您可以在其中提供带有数据名称的字符向量并覆盖默认值。 Thus, for your example you can simply run the following code. 因此,对于您的示例,您只需运行以下代码即可。

First install the GitHub version (I will submit this version to CRAN soon). 首先安装GitHub版本(我将尽快将此版本提交给CRAN)。

devtools::install_github("carloscinelli/benford.analysis") # install new version

Now you can provide the name of the data inside the for loop: 现在,您可以在for循环中提供数据的名称:

library(benford.analysis)

for (i in colnames(iris[1:4])){
  plot(benford(iris[[i]], data.name = i))
}

And all the plots will have the correct naming as you wish (below). 并且所有图表都将根据您的意愿正确命名(如下)。

Created on 2019-08-10 by the reprex package (v0.2.1) reprex包创建于2019-08-10(v0.2.1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM