简体   繁体   English

如何通过标记x轴的特定部分来绘制正态分布?

[英]How to plot a normal distribution by labeling specific parts of the x-axis?

I am using the following code to create a standard normal distribution in R: 我使用以下代码在R中创建标准正态分布:

x <- seq(-4, 4, length=200)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type="l", lwd=2)

I need the x-axis to be labeled at the mean and at points three standard deviations above and below the mean. 我需要在平均值和平均值上下三个标准偏差处标记x轴。 How can I add these labels? 我该如何添加这些标签?

The easiest (but not general) way is to restrict the limits of the x axis. 最简单(但不是一般)的方法是限制x轴的限制。 The +/- 1:3 sigma will be labeled as such, and the mean will be labeled as 0 - indicating 0 deviations from the mean. +/- 1:3 sigma将被标记为这样,并且平均值将被标记为0 - 表示与平均值的0偏差。

plot(x,y, type = "l", lwd = 2, xlim = c(-3.5,3.5))

在此输入图像描述

Another option is to use more specific labels: 另一种选择是使用更具体的标签:

plot(x,y, type = "l", lwd = 2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))

Using the code in this answer , you could skip creating x and just use curve() on the dnorm function: 使用此答案中的代码,您可以跳过创建x并在dnorm函数上使用curve()

curve(dnorm, -3.5, 3.5, lwd=2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))

But this doesn't use the given code anymore. 但是这不再使用给定的代码了。

An extremely inefficient and unusual, but beautiful solution, which works based on the ideas of Monte Carlo simulation, is this: 基于蒙特卡罗模拟思想的极其低效且不寻常但美观的解决方案是:

  1. simulate many draws (or samples) from a given distribution (say the normal). 模拟来自给定分布的许多绘制(或样本)(比如正常)。
  2. plot the density of these draws using rnorm . 使用rnorm绘制这些绘制的密度。 The rnorm function takes as arguments ( A,B,C ) and returns a vector of A samples from a normal distribution centered at B , with standard deviation C . rnorm函数将参数( A,B,C )作为参数并从以B为中心的正态分布返回A样本的向量,标准差为C.

Thus to take a sample of size 50,000 from a standard normal (ie, a normal with mean 0 and standard deviation 1), and plot its density, we do the following: 因此,从标准法线(即平均值为0且标准差为1的法线)中取出大小为50,000的样本,并绘制其密度,我们执行以下操作:

x = rnorm(50000,0,1)

plot(density(x))

As the number of draws goes to infinity this will converge in distribution to the normal. 随着绘制数量达到无穷大,这将在向正常分布的方向上收敛。 To illustrate this, see the image below which shows from left to right and top to bottom 5000,50000,500000, and 5 million samples. 为了说明这一点,请参见下图,其中显示了从左到右,从上到下的5000,50000,500000和500万个样本。 5000,500,500,500000,以及来自普通PDF的500万个样本

If you like hard way of doing something without using R built in function or you want to do this outside R, you can use the following formula. 如果您喜欢在不使用R内置函数的情况下做某事或者您想在R之外执行此操作,那么您可以使用以下公式。

在此输入图像描述

x<-seq(-4,4,length=200)
s = 1
mu = 0
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2))
plot(x,y, type="l", lwd=2, col = "blue", xlim = c(-3.5,3.5))

In general case, for example: Normal(2, 1) 一般情况下,例如:正常(2,1)

f <- function(x) dnorm(x, 2, 1)
plot(f, -1, 5)

This is a very general, f can be defined freely, with any given parameters, for example: 这是非常通用的,f可以使用任何给定的参数自由定义,例如:

f <- function(x) dbeta(x, 0.1, 0.1)
plot(f, 0, 1)

I particularly love Lattice for this goal. 我特别喜欢莱迪思这个目标。 It easily implements graphical information such as specific areas under a curve, the one you usually require when dealing with probabilities problems such as find P(a < X < b) etc. Please have a look: 它可以轻松实现图形信息,例如曲线下的特定区域,在处理概率问题时通常需要的信息,例如找到P(a <X <b)等。请看一下:

library(lattice)

e4a <- seq(-4, 4, length = 10000)            # Data to set up out normal
e4b <- dnorm(e4a, 0, 1)

         xyplot(e4b ~ e4a,                   # Lattice xyplot
               type = "l",
               main = "Plot 2",
               panel = function(x,y, ...){
                   panel.xyplot(x,y, ...)
                   panel.abline( v = c(0, 1, 1.5), lty = 2)  #set z and lines

                   xx <- c(1, x[x>=1 & x<=1.5], 1.5)         #Color area
                   yy <- c(0,   y[x>=1 & x<=1.5], 0) 
                   panel.polygon(xx,yy, ..., col='red')
               })

在此输入图像描述

In this example I make the area between z = 1 and z = 1.5 stand out. 在这个例子中,我使z = 1z = 1.5之间的区域脱颖而出。 You can move easily this parameters according to your problem. 您可以根据问题轻松移动此参数。

Axis labels are automatic. 轴标签是自动的。

This is how to write it in functions: 这是如何在函数中编写它:

normalCriticalTest <- function(mu, s) {
  x <- seq(-4, 4, length=200) # x extends from -4 to 4
  y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2)) # y follows the formula 
of the normal distribution: f(Y)
  plot(x,y, type="l", lwd=2, xlim = c(-3.5,3.5))
  abline(v = c(-1.96, 1.96), col="red") # draw the graph, with 2.5% surface to 
either side of the mean
}
normalCriticalTest(0, 1) # draw a normal distribution with vertical lines.

Final result: 最后结果:

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM