简体   繁体   中英

How to make a relative frequency normal distribution?

Ok so basically I have to plot a relative frequency histogram (which I've done) but I also have to plot a normal distribution curve over it. And no matter how I do it it's always for absolute frequency and not relative freqency.

This is what I have so far:

set.seed(1099)

N <- 1520
n_1 <- 4
n_2 <- 30
n_3 <- 76
Valor_esperado = (8 + 12)/2
Variancia = (12-8)^2/12

Amostra_1 <- matrix( runif(N*n_1,min = 8,max = 12)
             , nrow = n_1)

Amostra_2 <- matrix( runif(N*n_2,min = 8,max = 12)
, nrow = n_2)

Amostra_3 <- matrix( runif(N*n_3,min = 8,max = 12)
, nrow = n_3)


media_1 <- colMeans(Amostra_1)
media_2 <- colMeans(Amostra_2)
media_3 <- colMeans(Amostra_3)


Amostra_1 <- as.numeric(unlist(media_1))
Amostra_2 <- as.numeric(unlist(media_2))
Amostra_3 <- as.numeric(unlist(media_3))

#par(mfrow=c(2,2))

h <- hist(Amostra_1, plot=FALSE)
h$density = h$counts/sum(h$counts) * 100
plot(h, main="n = 4",
     xlab = NULL,
     ylab="Frequência Relativa",
     col="blue",
     freq=FALSE)


h <- hist(Amostra_2, plot=FALSE)
h$density = h$counts/sum(h$counts) * 100
plot(h, main="n = 30",
     xlab = NULL,
     ylab="Frequência Relativa",
     col="red",
     freq=FALSE)

h <- hist(Amostra_3, plot=FALSE)
h$density = h$counts/sum(h$counts) * 100
plot(h, main="n = 76",
     xlab = NULL,
     ylab="Frequência Relativa",
     col="yellow",
     freq=FALSE)

Given the histogram you've defined, you need a Gaussian curve that integrates to (100*binwidth) rather than 1. This should do it (for example):

binwidth <- diff(h$breaks)[1]
curve(dnorm(x, mean = mean(Amostra_1), 
            sd = sd(Amostra_1)) * binwidth*100, 
      add = TRUE)

In this particular case the top of the curve gets clipped because the y-axis for the histogram is only based on the bar heights (bin densities), not considering the peak of the theoretical curve. The simple/crude way to fix this would be to add ylim = c(0, max(h$density)*1.1) when plotting your histogram, to extend the maximum a bit (one "correct", slightly more annoying way is to compute max(h$density) , compute dnorm(0, ...)*binwidth*100 (the max value of the theoretical curve), and use the maximum of these two values when setting ylim ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM