具有巨大独特观察力的r中的散点图

Question

Currently, plot is not useful. 当前，绘图没有用。 How would I plot this distribution, since the range is too high? 由于范围太高，我将如何绘制此分布？

I have data of 50 year and have to see which activity is most harmful. 我有50年的数据，必须查看哪种活动最有害。

The data contain about 1000 unique activity say column1 . 数据包含大约1000个唯一活动，例如column1 。 I am using groupby(column1) and summarise(total = sum(column2,column3)) but problem is there few total value in 6 to 7 digit because of these two fact my plot x look bad and due few high value y most value are near x axis. 我正在使用groupby(column1)和summarise(total = sum(column2,column3))但问题是6至7位数字的总值很少，因为这两个事实使我的图x看起来不好，而高值y却很少，最大的值是x轴附近。

情节SS

I believe the problem is at x axis since so many names are clustered together due to less space. 我相信问题出在x轴上，因为空间较少，所以太多的名称聚集在一起。

Answer 1

I think a log transformation might help you gain some better insight out of your data: 我认为对数转换可以帮助您更好地了解数据：

Set up some fake data that resembles your situation: 设置一些与您的情况类似的虚假数据：

set.seed(1776)        # reproducible random numbers
num_obs <- 10000      # set number of observations
options(scipen = 999) # don't use scientific notation

# don't worry about this code, just creating a reproducible example
y <- abs(rnorm(num_obs) + 2) * abs(rnorm(num_obs) * 50)
make_these_outliers <- runif(num_obs, min=0, max=1) > 0.99
y[make_these_outliers] <- abs(rnorm(sum(make_these_outliers), + 2) * 
abs(rnorm(sum(make_these_outliers)) * 50000))

Recreate the plot you have now to show the issue you're facing: 重新创建您现在拥有的图以显示您面临的问题：

# recreating your current situation
plot(y, main='Ugly Plot')

Log10 transformation Log10转换

Now we'll use the log10 transformation on your data an visualize the result. 现在，我们将对数据使用log10转换，以可视化结果。 So a value of "10" is now "1", value of "100" is now "2", value of "1000" is now "3", etc. 因此，值“ 10”现在为“ 1”，值“ 100”现在为“ 2”，值“ 1000”现在为“ 3”，依此类推。

# log10
plot(log10(y), col= rgb(0, 0, 0, alpha=0.3), pch=16, main='Log Scale and Transparency - Slightly Better')

The pch = 16 argument fills in the points and the alpha = 0.4 sets the opacity of each point. pch = 16参数填充点，而alpha = 0.4设置每个点的不透明度。 An alpha of 0.4 means an opacity of 40% (can also think of this as 60% transparent). alpha为0.4表示不透明度为40％（也可以将其视为60％透明）。

ggplot2 GGPLOT2

I'll also show this in ggplot2, because using the scale transformations, ggplot2 is smart enough to put the true value on the y-axis to prevent you from having to do the mental gymnastics of log10 transforms in your head. 我还将在ggplot2中对此进行演示，因为使用比例转换，ggplot2足够聪明，可以将真实值放在y轴上，从而避免了脑海中进行log10转换的心理锻炼。

# now with ggplot2 
# install.packages("ggplot2")    # <-- run this if you haven't installed ggplot2 yet
library(ggplot2)

# ggplot2 prefers your data to be in a data.frame (makes it easier to work with)
data_df <- data.frame(
    index = 1:num_obs,
    y = y)


ggplot(data = data_df, aes(x = index, y = y)) +
    geom_point(alpha=0.2) +
    scale_y_continuous(trans="log10") +
    ggtitle("Y-axis reflects values of the datapoints", "even better?") +
    theme_bw(base_size = 12)

At this point, you can start to tell how I've constructed the fake data, which is why there is such a high concentration of points in the 10-1000 range. 此时，您可以开始说出我是如何构造假数据的，这就是为什么在10-1000范围内有如此高的点集中度的原因。

Hopefully this helps! 希望这会有所帮助！ I definitely recommend taking PauloH's advice and asking around on stats.stackexchange.com as well to make sure you aren't misrepresenting your data. 我绝对建议您采纳PauloH的建议并在stats.stackexchange.com上四处询问，以确保您不会歪曲您的数据。

Answer 2

Using ggplot2 instead and setting alpha may solve your problem but if that is not enough you may want tag along zoom_facet() from the ggforce package. ggplot2并设置alpha可以解决您的问题，但是如果这还不够，您可能需要从ggforce包中沿zoom_facet()标记。

set.seed(1776)      
num_obs <- 10000     
options(scipen = 999) 

y <- abs(rnorm(num_obs) + 2) * abs(rnorm(num_obs) * 50)
make_these_outliers <- runif(num_obs, min=0, max=1) > 0.99
y[make_these_outliers] <- abs(rnorm(sum(make_these_outliers), + 2) * 
                                abs(rnorm(sum(make_these_outliers)) * 50000))

# install.packages('ggplot2')
library(ggplot2)
# install.packages('ggforce')
library(ggforce)

data_df <- data.frame(
  index = 1:num_obs,
  y = y)


ggplot(data = data_df, aes(x = index, y = y)) +
  geom_point(alpha=0.05) +
  facet_zoom(y = (y <= 500), zoom.size = .8) +
  theme_bw()

The result would look more or less like the following: 结果将大致如下所示：

Hope it helps. 希望能帮助到你。 Check the ggforce 's GitHub: 检查ggforce的GitHub：

https://github.com/thomasp85/ggforce https://github.com/thomasp85/ggforce

具有巨大独特观察力的r中的散点图

问题描述

2 个解决方案

解决方案1
2 2018-09-01 18:39:30

Set up some fake data that resembles your situation: 设置一些与您的情况类似的虚假数据：

Recreate the plot you have now to show the issue you're facing: 重新创建您现在拥有的图以显示您面临的问题：

Log10 transformation Log10转换

ggplot2 GGPLOT2

解决方案2
2 2018-09-01 20:02:41

具有巨大独特观察力的r中的散点图

问题描述

2 个解决方案

解决方案1 2 2018-09-01 18:39:30

Set up some fake data that resembles your situation: 设置一些与您的情况类似的虚假数据：

Recreate the plot you have now to show the issue you're facing: 重新创建您现在拥有的图以显示您面临的问题：

Log10 transformation Log10转换

ggplot2 GGPLOT2

解决方案2 2 2018-09-01 20:02:41

解决方案1
2 2018-09-01 18:39:30

解决方案2
2 2018-09-01 20:02:41