[英]R (studio) factor with levels
I'm a student currently learning to work with R (studio), for this I received a task. 我是一名正在学习与R(工作室)合作的学生,为此我收到了一项任务。 I am supposed to compare some mostly random generated data and draw a conclusion from this. 我应该比较一些随机生成的数据,并据此得出结论。
The problem I'm having however is the fact that this data has a factor with 5 levels and i want to compare the data one level at a time... 但是,我遇到的问题是该数据具有5个级别的因数,我想一次比较一个级别的数据...
> str(data)
'data.frame': 275 obs. of 5 variables:
$ leverancier : Factor w/ 5 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
$ las_trekproef : num 211 375 503 195 221 ...
$ las_score : num 2.6 3.1 4.3 2.6 2.7 3.3 3.9 2.2 2.7 2.7 ...
$ afwijking_draaiwerk: num 0.081 0.061 0.015 0.125 0.256 0.004 0.124 0.016 0.042 0.062 ...
$ afwijking_freeswerk: num 0.336 0.026 0.032 0.161 0.36 0.447 0.062 0.176 0.317 0.212 ...
What I want to do is put the data for each level in a different variable so I can make graph's and box-plots one level at a time. 我想做的是将每个级别的数据放在不同的变量中,这样我就可以一次将图形和方框图设为一个级别。
A <- (all data for level A here) A <- (此处为A级的所有数据)
Instead of: 代替:
summary(data)
leverancier las_trekproef las_score afwijking_draaiwerk afwijking_freeswerk
A:55 Min. :128.6 Min. :2.000 Min. :0.0000 Min. :0.0010
B:55 1st Qu.:270.6 1st Qu.:2.700 1st Qu.:0.0355 1st Qu.:0.1210
C:55 Median :361.0 Median :3.500 Median :0.0760 Median :0.2670
D:55 Mean :356.1 Mean :3.513 Mean :0.1268 Mean :0.3055
E:55 3rd Qu.:443.6 3rd Qu.:4.250 3rd Qu.:0.1340 3rd Qu.:0.4255
Max. :571.4 Max. :5.000 Max. :1.1390 Max. :1.3890
Thanks in advance, 提前致谢,
NH NH
you mean something like 你的意思是
a <- subset(data, leverancier=='A')
for help simply check out ?subset
要获得帮助,只需查看?subset
You could use tapply for your summary by factor. 您可以按因子将tapply用于摘要。 Boxplot will plot variable by factor if you use the formula interface and x is a factor. 如果您使用公式界面,并且x是一个因子,则Boxplot将按因子绘制变量。 You could also index your factor to subset y to generate a single boxplot. 您还可以将因子索引到y子集以生成单个箱线图。 However, for comparison of y based on factors you want to plot the levels on the same plot. 但是,为了基于因子比较y,您需要在同一图上绘制级别。 Here are some examples. 这里有些例子。
# Create example data
dat <- data.frame(leverancier=rep(c("A","A","B","B","B","A","C","D","D","C"),100),
las_trekproef=runif(1000,100,500),
las_score=runif(1000,1,4))
# Use tapply to summarize y by factor
tapply(dat$las_score, dat$leverancier, FUN=summary)
# Using formula interface plot y by factor
boxplot(las_score ~ leverancier, data=dat, notch=TRUE)
# You can also index y based on a factor level to create a single boxplot of y
boxplot(dat[dat$leverancier == "A" ,]$las_score, notch=TRUE)
Consider working with split
考虑使用split
split(data, data$leverancier)
will give you a list
of data.frame
s, each of which corresponds to one level of leverancier
. 将为您提供data.frame
的list
,每个list
对应一个级别的leverancier
。 You can then operate on each element at a time, or loop over the list to operate on each part in turn. 然后,您可以一次对每个元素进行操作,也可以遍历列表依次对每个部分进行操作。
I realize this does not directly answer your question (Seb's answer does that), but it should point you in a more idiomatic direction for working with data in R. 我意识到这并不能直接回答您的问题(Seb的答案就是那样),但是它应该为您提供处理R中数据的更惯用的方向。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.