简体   繁体   English

散点图上两点之间的标签范围,具有百分比差异

[英]Label Range between two points on scatterplot with the percent difference

I have a simple scatterplot showing sales difference between years at different ranges. 我有一个简单的散点图,显示了不同范围内不同年份之间的销售差异。

So, when the range is ">$400", sales are X in 2013 and X in 2014. 因此,当范围> $ 400“时,2013年的销售额为X,2014年的销售额为X。

I am trying to add an annotation at certain points showing the percent difference from 2013 to 2014. Is that possible? 我正在尝试在某些点添加注释,以显示从2013年到2014年的百分比差异。这可能吗?

Here is the dput: 这是Dput:

structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L), Range = structure(c(8L, 9L, 10L, 11L, 12L, 13L, 
14L, 16L, 17L, 18L, 19L, 20L, 21L, 23L, 24L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 26L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 17L, 18L, 19L, 
20L, 21L, 23L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 26L), .Label = c("$40M", 
"$50M", "$60M", "$70M", "$71-80M", "$81-90M", "$91-100M", "$101-110M", 
"$111-120M", "$121-130M", "$131-140M", "$141-150M", "$151-160M", 
"$161-170M", "$171-180M", "$181-190M", "$191-200M", "$200-225M", 
"$226-250M", "$251-275M", "$276-300M", "$301-325M", "$326-350M", 
"$351-375M", "$376-400M", ">$400M"), class = "factor"), Avg_TOTALS = c(44732492.5, 
42902206, 47355762, 49604750.6666667, 51132411, 51943986, 54798652.5, 
61313778.5, 68577392, 74457422.6666667, 84805802.5, 96762417, 
99355792, 172956681, 189815908, 31762600.8571429, 33042576.2857143, 
34964083.8, 34349980.2, 35193407, 36049038.6666667, 42039793.3333333, 
486133671, 35996925, 35496337.5, 39139472.5, 36993568.5, 39570379, 
40139421.5, 43835119, 51358298.5, 53024160, 61185564, 67726723, 
71481251, 89873814, 27746650.1428571, 27633867, 29855703.5714286, 
29655265.2, 31163788.8, 29240507, 33810795.25, 192756973)), .Names = c("Year", 
"Range", "Avg_TOTALS"), class = "data.frame", row.names = c(NA, 
-44L))

And here is the chart I am currently generating: 这是我当前正在生成的图表:

orderlist = c("$40M", "$50M", "$60M", "$70M", "$71-80M", "$81-90M", "$91-    100M", "$101-110M", "$111-120M", "$121-130M", 
              "$131-140M", "$141-150M", "$151-160M", "$161-170M", "$171-180M", "$181-190M", "$191-200M", "$200-225M",
              "$226-250M", "$251-275M", "$276-300M", "$301-325M", "$326-350M", "$351-375M", "$376-400M", ">$400M")

myDF = transform(myDF, Range = factor(Range, levels = orderlist))

myChart <- ggplot(myDF, aes(x = Range, y = Avg_TOTALS)) +
           geom_point(aes(color = factor(Year))) + 
           theme_tufte() +
           theme(axis.text.x= element_text(angle = 90, hjust = 0)) +
           labs(x = "Range", y = "Sales by Range", title = "MyChart")+
           scale_y_continuous(breaks = c(50000000, 100000000, 200000000,
                                         300000000,400000000, 500000000),
                              labels = dollar)

Which gives me: 这给了我:

MyChart

And leads me to this question: 并引出我这个问题:

How would I add the percent difference between each of those points, with 2013 being the base year? 以2013年为基准年,我如何将这些点之间的百分比差异相加? Also, there are a few ranges where there were sales in only one of the two years- would it be possible to skip the percent labels on those? 此外,在两年中只有一年的销售量存在一些范围内,是否有可能跳过这些标签上的百分比标签? A condition in which data must exist in both years to be included? 要包含两个年份中必须存在的数据的条件?

Thanks for any help! 谢谢你的帮助!

Here is one way. 这是一种方法。 I think there are better ways. 我认为有更好的方法。 This is my best with my sleepy brain right now. 这是我现在昏昏欲睡的最好的状态。 Hope you do not mind that. 希望你不要介意。 Let me briefly explain the code. 让我简要解释一下代码。 I followed you. 我关注你了。 Then, I obtained the data which ggplot is using, which I called foo. 然后,我获得了ggplot使用的数据,我将其称为foo。 I created a master data frame to deal with missing data points and used join. 我创建了一个主数据框架来处理丢失的数据点,并使用了联接。 The dplyr part was doing some calculation and stuff to get proportion. dplyr部分正在做一些计算和填充以获取比例。 Using the outcome of it in annotate , I assigned the labels you wanted. annotate使用它的结果,我分配了所需的标签。 Hope this will help you. 希望这会帮助你。 zzz... ZZZ ...

DATA 数据

mydf <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L), Range = structure(c(8L, 9L, 10L, 11L, 12L, 13L, 
14L, 16L, 17L, 18L, 19L, 20L, 21L, 23L, 24L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 26L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 17L, 18L, 19L, 
20L, 21L, 23L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 26L), .Label = c("$40M", 
"$50M", "$60M", "$70M", "$71-80M", "$81-90M", "$91-100M", "$101-110M", 
"$111-120M", "$121-130M", "$131-140M", "$141-150M", "$151-160M", 
"$161-170M", "$171-180M", "$181-190M", "$191-200M", "$200-225M", 
"$226-250M", "$251-275M", "$276-300M", "$301-325M", "$326-350M", 
"$351-375M", "$376-400M", ">$400M"), class = "factor"), Avg_TOTALS = c(44732492.5, 
42902206, 47355762, 49604750.6666667, 51132411, 51943986, 54798652.5, 
61313778.5, 68577392, 74457422.6666667, 84805802.5, 96762417, 
99355792, 172956681, 189815908, 31762600.8571429, 33042576.2857143, 
34964083.8, 34349980.2, 35193407, 36049038.6666667, 42039793.3333333, 
486133671, 35996925, 35496337.5, 39139472.5, 36993568.5, 39570379, 
40139421.5, 43835119, 51358298.5, 53024160, 61185564, 67726723, 
71481251, 89873814, 27746650.1428571, 27633867, 29855703.5714286, 
29655265.2, 31163788.8, 29240507, 33810795.25, 192756973)), .Names = c("Year", 
"Range", "Avg_TOTALS"), class = "data.frame", row.names = c(NA, 
-44L))


orderlist = c("$40M", "$50M", "$60M", "$70M", "$71-80M", "$81-90M", "$91-    100M", "$101-110M", "$111-120M", "$121-130M", 
          "$131-140M", "$141-150M", "$151-160M", "$161-170M", "$171-180M", "$181-190M", "$191-200M", "$200-225M",
          "$226-250M", "$251-275M", "$276-300M", "$301-325M", "$326-350M", "$351-375M", "$376-400M", ">$400M")

mydf = transform(myDF, Range = factor(Range, levels = orderlist))

g <- ggplot(mydf, aes(x = Range, y = Avg_TOTALS)) +
     geom_point(aes(color = factor(Year))) + 
     #theme_tufte() +
     theme(axis.text.x= element_text(angle = 90, hjust = 0))+
     labs(x="Range", y = "Sales by Range", title = "MyChart")+
     scale_y_continuous(breaks = c(50000000, 100000000, 200000000, 300000000,400000000, 500000000), labels = dollar)

library(dplyr)

foo <- ggplot_build(g)$data[[1]] %>%
       arrange(group) %>%
       mutate(year = c(rep("2013", times = 23), rep("2014", times = 21)))


master <- expand.grid(year = c("2013", "2014"), group = 1:24)

full_join(master, foo, by = c("year", c("group" = "x"))) %>%
group_by(group) %>%
mutate(prop = round(order_by(year, y / first(y)), 2)) %>%
summarise(y = first(y), prop = min(prop, na.rm = FALSE)) -> txt

g + annotate("text", x = txt$group, y = txt$y + 15000000, label = txt$prop)

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM