简体   繁体   English

如何在R中绘制相对比例随时间变化

[英]How to plot relative proportions over time in R

For my research I am trying to create a similar graph based around this graph I found in a piece of literature: 为了进行研究,我试图根据我在文献中发现的该图创建一个相似的图:

图来自Golder&Hubermann(2005)

My experiment involved the genre-tagging of 10 different songs. 我的实验涉及10种不同歌曲的流派标记。 I saved the tags (the words people used to describe seperately). 我保存了标签(人们用来分别描述的单词)。 The x-asis should represent all the participants that took part in chronological order. x-asis应该代表按时间顺序参加的所有参与者。 The y-axis should represent how often a word is used in a tag. y轴应表示单词在标签中使用的频率。 Consider this sample data: 考虑以下样本数据:

df <- data.frame(tagid= numeric(0), participantid = numeric(0), tag = character(0))
newRow <-data.frame(tagid=1, participantid=1, tag = "triphop")
df <-rbind(df,newRow)
newRow <-data.frame(tagid=2, participantid=1, tag = "electronic")
df <-rbind(df,newRow)
newRow <-data.frame(tagid=3, participantid=2, tag = "mellow")
df <-rbind(df,newRow)
newRow <-data.frame(tagid=4, participantid=2, tag = "electronic")
df <-rbind(df,newRow)
newRow <-data.frame(tagid=5, participantid=3, tag = "electronic")
df <-rbind(df,newRow)

Tagid 1 and 2 belong to the same participant and should have the same x coordinate. Tagid 1和2属于同一参与者,并且应具有相同的x坐标。 3 and 4 belong to participant 2 and tagid 5 belongs to participant 3. 3和4属于参与者2,tagid 5属于参与者3。

For this dataset I'd like to plot a graph like this (excuse the drawing): 对于这个数据集,我想绘制一个这样的图(请原图):

我希望能够在R中绘制的图形

The y-axis represents the percentage of participants that have used a specific word to describe this music piece. y轴代表使用特定单词描述此音乐作品的参与者的百分比。 As 'electronic' is used by all three participants it stays at 100%. 由于所有三个参与者都使用“电子”,因此它保持在100%。 'Triphop' was used by participant 1, but not by participant 2 and 3, decreasing from 100%, to 50%, to 33% at participant 3. 参与者1使用了“ Triphop”,但参与者2和3没有使用“ Triphop”,参与者3从100%降低到50%,再降低到33%。

Code is a bit messy, but probably you want something like this ? 代码有点混乱,但是您可能想要这样的代码? You need to complete the dataframe so each participantid has rows for all three tag levels. 您需要完成数据框,以便每个participantid ID都具有针对所有三个标记级别的行。 Then, with the cumulative sum of the tag levels and the cumulative sum of participants, you can get the proportion. 然后,使用标签级别的累加总和和参与者的累加总和,可以得到比例。

df %>%  
  group_by(participantid, tag) %>% 
  summarise(n = n()) %>% 
  complete(tag, nesting(participantid), fill = list(n = 0)) %>%
  group_by(tag) %>% 
  mutate(absolute = cumsum(n)) %>%
  ungroup() %>%
  mutate(id = rep(1:3, each = length(levels(tag)))) %>%
  mutate(proportion = ifelse(absolute / id != 0, absolute / id, NA)) %>%

  ggplot(aes(x = participantid, y = proportion, color = tag)) + geom_line(lwd = 1)

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM