简体   繁体   English

将成对类别组合在一起

[英]Group pairwise categories together

I am currently working with a dataset where loans are displayed with a purpose for the loan and an associated loan grade for each loan.我目前正在处理一个数据集,其中显示了贷款目的以及每笔贷款的相关贷款等级。

The dataset is called loancase and one of the columns is the purpose while another column is grade .该数据集称为Loancase ,其中一列是目的,而另一列是等级 在此处输入图片说明

Below I have the matrix which is to be filled in a pairwise manner with proportions.下面我有一个矩阵,该矩阵将以成对的方式按比例填充。 Each row should total to 100 percent meaning each entry is the proportion for that specific purpose that received that grade.每行的总和应为 100%,这意味着每个条目是获得该等级的特定目的的比例。 For instance, the row for [Car, ] may look like 20, 20, 0, 0, 20, 0, 40.例如, [Car, ]的行可能看起来像 20, 20, 0, 0, 20, 0, 40。

Note that the current data placeholder is NA and I am trying to replace that with a vector listing each desired entry.请注意,当前的数据占位符是 NA,我试图用列出每个所需条目的向量替换它。

matrix(data = NA, nrow = 14, ncol = 7, dimnames = list(levels(loancase$purpose), levels(loancase$grade)))

在此处输入图片说明

How do I achieve this goal of filling in each entry with the desired value?我如何实现用所需值填充每个条目的目标? I am currently thinking I use tapply() but don't know how to achieve that.我目前在想我使用 tapply() 但不知道如何实现。 Here is the current code that will go in the place of "NA" but it is not correct as of now.这是将代替“NA”的当前代码,但目前它不正确。

grades.per.purpose = tapply(loancase$grade, levels(loancase$purpose), sum)

Since you didn't supply usable data, I'll make up a toy example:由于您没有提供可用数据,我将制作一个玩具示例:

df = read.table(text = "grade   purpose   amount
            A  Car   100
            B  Car   200
            C  Car   100
            A  Moving  200
            B  Moving  50
            B  Moving  50", header = TRUE)

We want to show Car loans are 50% B-Grade, 25% A- and C-grade.我们希望显示Car贷款为 50% B 级、25% A 级和 C 级。 And Moving loans are 67% A-grade, 33% B-grade.Moving贷款是67%的A级,33%的B级。

I like to use dplyr library for this kind of grouping and summarising:我喜欢使用dplyr库进行这种分组和总结:

library(dplyr)
x = df %>% 
    group_by(purpose) %>% 
    mutate(purpose.total = sum(amount)) %>% 
    group_by(purpose, grade) %>% 
    summarise(percent = sum(amount / purpose.total))

The result:结果:

  purpose  grade   percent
1     Car      A 0.2500000
2     Car      B 0.5000000
3     Car      C 0.2500000
4  Moving      A 0.6666667
5  Moving      B 0.3333333   

To group it into a square like you asked for, try the tidyr library:要将其分组为您要求的正方形,请尝试使用tidyr库:

tidyr::spread(x, key = grade, value = percent, fill = 0)

Result:结果:

  purpose         A         B     C
1     Car 0.2500000 0.5000000  0.25
2  Moving 0.6666667 0.3333333  0.00    

Though I believe that it's nonsense to be forbidden to use packages, there is a base R solution, with the final result presented in a way that might please the OP.虽然我认为禁止使用包是无稽之谈,但有一个base R解决方案,最终结果以可能取悦 OP 的方式呈现。

xt <- xtabs(amount ~ grade + purpose, df)
t(xt)/colSums(xt)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM