简体   繁体   English

基于数据框 R 子集的一列中“分类值”的百分比

[英]Percentages of "categorical values" in one column based on subset of data frame R

I have data that consists of three columns (as an example) named data$engine, data$unit, and data$Turn.我的数据由名为 data$engine、data$unit 和 data$Turn 的三列(作为示例)组成。 data$Turn is categorical variable with values 0, 1 and 2. For each unique value of data$engine, there can be several values of data$unit. data$Turn 是具有值 0、1 和 2 的分类变量。对于 data$engine 的每个唯一值,data$unit 可以有多个值。

I would like to calculate percent of 0, 1, and 2 in data$Turn for unique data$unit and data$engine respectively.我想分别为唯一的 data$unit 和 data$engine 计算 data$Turn 中 0、1 和 2 的百分比。 I have hundred thousand rows but I am pasting data structure for only two unique values of data$engine... Please note that that each data$unit (for specific data$engine) can have thousands of rows, and thus for calculating %ages, i would like to proceed as:我有十万行,但我只为 data$engine 的两个唯一值粘贴数据结构...请注意,每个 data$unit(对于特定的 data$engine)可以有数千行,因此用于计算 %ages ,我想继续:

%age of 0's for data$unit 207 and data$engine 1111 = 
counts of all zeros within data$unit 207 and data$engine 1111 (DIVIDED BY) 
summation of all counts of 0, 1, and 2 for this data$unit and data$engine.*emphasized text*

Similarly for % ages of 1's and 2's for data$unit 207 and data$engine 1111, 
and it continues for all other values of units and engines....

data$engine  data$unit     data$AvailableLeft
    1111       207                1
    1111       207                0
    1111       207                2
    1111       207                0
    1111       207                0
    1111       207                2
    1111       207                0
    1111       207                1
    1111       208                0
    1111       208                1
    1111       208                2
    1111       208                1
    1122       209                2
    1122       209                2
    1122       209                0
    1122       209                0
    1122       209                1

I would like to get my output in this manner ie getting average %age of 0, 1, and 2s for each data$unit and for each data$engine:我想以这种方式获得我的输出,即为每个 data$unit 和每个 data$engine 获得 0、1 和 2s 的平均百分比:

data$engine  data$unit     %age of 0s     %age of 1s    %age of 2s
 1111          207              ?              ?            ?
 1111          208              ?              ?            ?    
 1122          209              ?              ?            ?    
   .             .                    .
   .             .                    .
   .             .                    .

you can use data.table :你可以使用data.table

library(data.table)
setDT(data)[, .(p0=sum(AvailableLeft==0)/.N, 
                p1=sum(AvailableLeft==1)/.N, 
                p2=sum(AvailableLeft==2)/.N), 
             keyby=.(data, engine, unit)]

   engine unit   p0   p1   p2
1:   1111  207 0.50 0.25 0.25
2:   1111  208 0.25 0.50 0.25
3:   1122  209 0.40 0.20 0.40
library(data.table)
dt <- as.data.table(your_data)
dt[,.("p1"=paste(as.character(round(sum(data.AvailableLeft==1)*100/.N,2)),"%")),.(data.engine,data.unit)]

I would leave the % of data.AvailableLeft==0 and % of data.AvailableLeft==2 as figuring out them is trivial from here我会留下% of data.AvailableLeft==0% of data.AvailableLeft==2因为从这里找出它们是微不足道的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据一列中的重复项和另一列中的唯一值对 R 数据框进行子集化 - How to subset R data frame based on duplicates in one column and unique values in another 基于列值的子集数据帧 - Subset data frame based on column values R:根据另一列操作一个数据框列的值 - R: Manipulate values of one data frame column based on another column 根据数据框中另一列的值对数据框进行子集 - Subset a data frame based on values of another column in data frame 如何对 r 中的多个分类列值进行子集化? - How to subset multiple categorical column values in r? R-如何基于数据帧中的列值来显示数据子集行 - R- how subset lines of data based on column values in a data frame R 故障排除:根据数据框中另一列中的值对数据框中的一列的值求和 - R Troubleshooting: Sum values of one column in a data frame based on values in another column of the data frame 使用 %in% 对 r 中的数据(基于分类变量)进行子集化 - subset the data (based on categorical variable) in r, using %in% 根据列 x 的值对数据框进行子集化。 只想要R中的前两位 - Subset a data frame based on count of values of column x. Want only the top two in R 如何基于R中一行中的两个连续列值对数据帧进行子集 - how to subset a data frame based on two consecutive column values in a row in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM