[英]R: Histogram and Density on multiple user response data frame
Data reflect how users rated a book on an online book recommendation site while answering a question which has four answers. 数据反映用户在回答有四个答案的问题时如何在在线图书推荐网站上评价图书。 Users were allowed to choose more than one answer. 允许用户选择多个答案。
Goal is to obtain distribution plots by gender where X
axis as answer (X1,X2..)
and Y
axis as the count of books along with density overlay. 目标是按性别获得分布图,其中X
轴为答案(X1,X2..)
, Y
轴为书籍计数以及密度叠加。 It would be great for both male and female to be overlay one another. 对于男性和女性来说,彼此重叠是很好的。
book_id user_id rate X1 X2 X3 X4 Gender genre
40 001 4.5 0 1 0 0 male fiction
48 001 3.5 1 0 0 1 male fiction
54 001 4.0 1 0 0 0 male fiction
79 001 2.5 1 0 1 0 male non-fiction
80 001 4.5 0 0 1 0 male non-fiction
95 001 5.0 1 0 1 0 male non-fiction
95 002 3.0 0 0 0 1 Female non-fiction
99 002 4.5 0 0 1 0 Female non-fiction
02 002 0.5 0 0 0 0 Female non-fiction
05 002 4.5 1 0 1 0 Female non-fiction
54 002 4.0 0 1 0 0 Female fiction
79 002 2.5 1 0 1 0 Female non-fiction
80 002 4.5 0 0 1 0 Female non-fiction
07 002 4.5 1 0 1 0 Female fiction
07 003 5.0 1 0 1 0 Female fiction
09 003 4.0 0 0 1 0 Female auto-bio
54 003 4.0 1 0 0 0 Female fiction
79 003 2.5 1 0 1 0 Female non-fiction
80 003 4.5 0 0 1 0 Female non-fction
17 004 3.5 1 0 0 0 male auto-bio
21 004 5.0 1 0 1 0 male auto-bio
21 005 5.0 0 1 1 0 male auto-bio
17 005 0.5 0 0 0 1 male auto-bio
20 005 5.0 0 0 1 0 male fiction
20 006 1.5 0 0 0 1 male fiction
21 006 5.0 0 0 1 0 male auto-bio
21 007 2.0 1 0 0 0 male auto-bio
21 008 4.5 1 0 1 0 Female auto-bio
20 008 4.5 1 0 1 0 Female fiction
07 008 4.5 1 0 1 0 Female fiction
22 009 5.0 0 0 1 0 male fiction
54 009 4.0 1 0 0 0 male fiction
79 009 2.5 1 0 1 0 male non-fiction
80 010 4.5 1 0 1 0 male non-fiction
22 010 4.5 0 1 1 0 male fiction
22 011 0.5 0 0 1 0 Female fiction
28 011 3.5 1 0 0 0 Female auto-bio
Two users can rate the same book and answer the question in the same way or different way. 两个用户可以对同一本书进行评分,并以相同或不同的方式回答问题。 This creates two records per each book. 这会为每本书创建两个记录。 With that in mind, If group by Gender
and sum each column down would give gender level distribution to start with. 考虑到这一点,如果按Gender
分组并将每列相加,则会开始提供性别级别分布。
df %>% group_by(Gender) %>% summarize(x1 = sum(X1), x2 = sum(X2), x3=sum(X3),x4 =sum(X4))
Gender x1 x2 x3 x4
<fct> <int> <int> <int> <int>
1 Female 10 1 13 1
2 male 10 3 11 3
In addition to getting the plot: I also have the following question: Also just to confirm this is not the unique number of books female answer x1 since the same book can be answered by multiple users. 除了得到情节:我还有以下问题:也只是为了确认这不是女性回答x1的独特书籍数量,因为同一本书可以被多个用户回答。 Instead, it would be number of female choose a specific answer? 相反,它会是女性选择一个具体答案的数量?
A similar but different approach 一种类似但不同的方法
library(data.table)
library(ggplot2)
dt <- setDT(dt)
plottest <- melt(dt,measure.vars = patterns("^X"),variable.name = "question", value.name = "answer")
ggplot(data = plottest,aes(factor(book_id),answer))+
geom_col(aes(fill = as.factor(question), color = as.factor(question) ))+
facet_wrap(~Gender)+
labs(title = "",
y = "N",
x = "books",
color = "Question",
fill = "Question")
I am not sure I understand correctly but is the following code what you want? 我不确定我是否理解正确,但以下代码是您想要的?
library(dplyr)
library(ggplot2)
df2 <- df %>%
group_by(Gender) %>%
summarize(x1 = sum(X1), x2 = sum(X2), x3=sum(X3),x4 =sum(X4)) %>%
melt(id.vars = "Gender")
ggplot(df2, aes(variable, value, color = Gender, fill = Gender)) +
geom_bar(stat = "identity", position = "dodge")
After seeing the answer by @denis I adapted his code to do more or less the same but with position = "dodge"
. 在看到@denis的答案后,我调整了他的代码,或多或少地做了相同的但是使用position = "dodge"
。
df3 <- df %>%
group_by(Gender, book_id) %>%
summarize(x1 = sum(X1), x2 = sum(X2), x3=sum(X3),x4 =sum(X4)) %>%
melt(id.vars = c("Gender", "book_id"))
ggplot(df3, aes(as.factor(book_id), value, color = variable, fill = variable)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~ Gender)
As for the second question, you can use aggregate
to get the answers to each question by Gender
. 至于第二个问题,您可以使用aggregate
来按Gender
获得每个问题的答案。
aggregate(. ~ Gender, df[4:8], sum)
# Gender X1 X2 X3 X4
#1 Female 10 1 13 1
#2 male 10 3 11 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.