R-采样频率直方图：效率更高

Question

I'm a university student, beginning to explore R for an exam. 我是一名大学生，开始探索R进行考试。 Sorry for the vague title, as I have many questions related to this post. 抱歉，标题含糊，因为我对此帖子有很多疑问。

I've run into the problem of sampling a population of people who are either Male (M) or Female (F). 我遇到了对男性（M）或女性（F）的人群进行抽样的问题。 I wished to define a function that could take the number of Males and Females in this population, then create sample.number samples of size sample.size and return a data frame containing the sample proportions of females over the total size of the sample, with related frequencies. 我想限定，可以采取男性和女性的数目在该人群中，则创建一个功能sample.number大小的样品sample.size并返回包含女性的样本比例在样品的总大小的数据帧，以相关频率。

I'm positive there is a simple and well-optimized way to do this, but I've written a small function that (barely) works: 我很肯定有一种简单且经过优化的方法可以做到这一点，但是我编写了一个很小的函数，（几乎）可以正常工作：

senators <- function(Fem = 13, 
                 Mal = 87, 
                 sample.size = 10, 
                 sample.number = 100){

pop <- c(rep("F", Fem), rep("M", Mal)) # I create the population base

popsa <- list(NA)           # I make some empty variables used later
popsa.factor <- list(NA)    # Not sure if this passage is even needed...
popsa.proportion <- list(NA)

Here comes a for loop. 这是一个for循环。 I've read that for loops are really inefficient way to do this. 我读过， for循环实际上是效率低下的方法。 Is there a better way? 有没有更好的办法？

for(i in 1:sample.number){
  popsa[[i]] <- sample(pop, sample.size, replace = TRUE)
  popsa.factor[[i]] <- table(factor(popsa[[i]], levels = c("M", "F")))
  popsa.proportion[[i]] <- popsa.factor[[i]][2]/sample.size
  }

I start by assigning each element of the list popsa with a sample, then I use popsa to create a table from each sample, and store it in popsa.factor . 首先，给列表popsa每个元素分配一个示例，然后使用popsa从每个示例创建一个表，并将其存储在popsa.factor 。 Then I calculate the proportions of females over the total and store it in popsa.proportion . 然后，我计算女性在总人数中所占的比例，并将其存储在popsa.proportion 。 This for loop seems super messy to me, and is really slow to process lots of samples. for我来说，这个for循环超级混乱，并且处理许多样本真的很慢。 Is there a better, more efficient way to do what I've done here? 有没有更好，更有效的方法来完成我在这里所做的工作？

popsa.unlisted <- unlist(popsa.proportion)
popsa.frequency <- table(popsa.unlisted)

popsa.frame <- data.frame(Level = as.numeric(names(popsa.frequency)), 
                          Freq =  as.numeric(popsa.frequency))
return(popsa.frame)
} # This closes the function call

I then unlist popsa.proportion to get every proportion in a vector, and table those values to get the frequencies, storing them into popsa.frequency . 然后，我取消列出popsa.proportion以获取向量中的每个比例，并列出这些值以获取频率，并将其存储到popsa.frequency 。 Now I try to turn the factor popsa.frequency into a data frame, by cheating and converting the names of popsa.frequency as numeric and storing them as the first column of the data frame. 现在，我通过将popsa.frequency的名称作弊并将其转换为数字并将其存储为数据帧的第一列，尝试将因素popsa.frequency转换为数据帧。 The function then returns popsa.frame , as I wanted. 然后，该函数根据需要返回popsa.frame 。

popsa.frame , though, still carries over the factor properties of popsa.frequency in its first column ( Level ). 不过， popsa.frame仍在其第一列（ Level ）中popsa.frequency的因子属性。 How can I change this? 我该如何更改？ Should I? 我是不是该？

Since these are frequencies of a sample distribution, I'd like to create an histogram from this dataframe, although hist() only accepts numeric vectors, so popsa.frame isn't a valid object. 由于这些是样本分布的频率，因此我想从此数据帧创建直方图，尽管hist()仅接受数字矢量，所以popsa.frame不是有效的对象。 plot(popsa.frame) returns more or less what I want, though. plot(popsa.frame)或多或少返回我想要的。 How can I create such an histogram? 如何创建这样的直方图？

Edit: Following the marked answer below, I've also come up on how to simply convert the data frame the function creates into an object that hist() can actually use to create a frequency histogram (although using a barplot yields more or less the same graph, and possibly be a more statistically correct way to show such a result): 编辑：按照下面的标记答案，我还想出了如何将函数创建的数据帧简单地转换为hist()实际可用于创建频率直方图的对象（尽管使用条形图或多或少会产生相同的图表，并且可能是显示此类结果的更统计正确的方式）：

result <- senators(Fem=13,Mal=87,sample.size=50,sample.number=10000)

raw <- sapply(1:length(result$Level), function(x){
  rep(result$Level, result$Freq)
})

hist(raw)

Answer 1

Your function has some default values that leads to the creation of a data.frame by just doing senators() . 您的函数具有一些默认值， data.frame通过执行senators()即可创建data.frame 。

Following your data I would do: 根据您的数据，我会做：

df <- senators() # using default values
plot(df, type="h", lwd = 5, lend=1) # type changes your plot type while lwd changes line sizes, while lend would give squared aspect yo your bars.

Take a look at ?plot to see the types of plots you can do. 查看?plot以查看可以执行的绘图类型。 Also, you can see how change parameters by doing ?par . 另外，您还可以通过执行?par来查看如何更改参数。

PS: look at this post for line width details. PS：请看这篇文章以了解线宽细节。

Answer 2

The creation of the lists and the for loop has some performance bottlenecks. 列表和for循环的创建存在一些性能瓶颈。 I was able to use sapply to remove the for loop and some of the temporary variables. 我能够使用sapply删除for loop和一些临时变量。

I am still returning the data fame and another option would return the vector answer just pass the result to the histogram plotting function for your final plot. 我仍在返回数据名声，另一种选择将返回矢量答案，只需将结果传递给最终绘图的直方图绘图函数即可。

senators <- function(Fem = 13, 
                     Mal = 87, 
                     sample.size = 10, 
                     sample.number = 100){

  pop <- c(rep("F", Fem), rep("M", Mal)) # I create the population base

  answer<-sapply(1:sample.number, function(x){popsa <- sample(pop, sample.size, replace = TRUE);
                                            length(popsa[popsa=="F"])/sample.size})

popsa.frequency <- table(answer)

popsa.frame <- data.frame(Level = as.numeric(names(popsa.frequency)), 
                          Freq =  as.numeric(popsa.frequency))
return(popsa.frame)
} 

senators()

R-采样频率直方图：效率更高

问题描述

2 个解决方案

解决方案1
0 2017-12-27 15:49:53

解决方案2
0 已采纳 2017-12-27 20:15:28

R-采样频率直方图：效率更高

问题描述

2 个解决方案

解决方案1 0 2017-12-27 15:49:53

解决方案2 0 已采纳 2017-12-27 20:15:28

解决方案1
0 2017-12-27 15:49:53

解决方案2
0 已采纳 2017-12-27 20:15:28