在R中生成直方图时出错

Question

I have a text file containing: 我有一个包含以下内容的文本文件：

Tue Feb 11 12:19:39 +0000 2014
Tue Feb 11 12:19:56 +0000 2014
Tue Feb 11 12:20:04 +0000 2014

and i read it into r 我读进了r

dataset <- read.csv("Time.txt")

and in order for R to recognise the timestamps in the file, i write: 为了让R识别文件中的时间戳，我写道：

time <- strptime(dataset[,1], format = "%a %b %d %H:%M:%S %z %Y")

and whenever i try to plot a histogram with: 每当我尝试绘制直方图时：

hist(time, breaks = 100)

it produces an error together with a generated histogram 它与生成的直方图一起产生错误

In breaks[-1L] + breaks[-nB] : NAs produced by integer overflow

What could be the issue that is prompting this error? 可能是导致此错误的问题？

Answer 1

Since you asked what could be causing the error here it is: 由于您询问了什么可能导致错误，所以它是：

The error is created when the hist.default function calculates the midpoints of the histogram. 当hist.default函数计算直方图的中点时，会创建该错误。 This vector mids <- 0.5 * (breaks[-1L] + breaks[-nB]) calculates the halfway point between each break. 此向量中值mids <- 0.5 * (breaks[-1L] + breaks[-nB])计算每个中断之间的中点。 The issue arises because the breaks are generated as integers: 出现此问题是因为中断是作为整数生成的：

If the argument breaks is numeric and length == 1 then the hist.default function (which is called by hist.POSIXt ) creates a vector of breaks based on the range of x and the number of breaks. 如果breaks参数是numeric且length == 1则hist.default函数（由hist.POSIXt ）会根据x的范围和中断次数创建breaks向量。 This is done using the pretty command. 这是使用pretty命令完成的。 For reasons I have not looked into too closely, if breaks is small enough that pretty(range(x),n=breaks, min.n = 1) returns only one of each value eg: 由于一些原因，我没有仔细研究，如果breaks足够小，从而pretty(range(x),n=breaks, min.n = 1)仅返回每个值之一，例如：

pretty(range(x), n = 35, min.n = 1)
#[1] 1392121179 1392121180 1392121181 1392121182 1392121183 1392121184
#[7] 1392121185 1392121186 1392121187 1392121188 1392121189 1392121190
#[13] 1392121191 1392121192 1392121193 1392121194 1392121195 1392121196
#[19] 1392121197 1392121198 1392121199 1392121200 1392121201 1392121202
#[25] 1392121203 1392121204

then the output is an integer type. 那么输出是integer类型。 If however, the number of breaks is larger and some of the outputs are duplicated: 但是，如果中断次数较多，则某些输出将重复：

pretty(range(x), n = 36, min.n = 1)
# [1] 1392121179 1392121180 1392121180 1392121181 1392121181 1392121182
# [7] 1392121182 1392121183 1392121183 1392121184 1392121184 1392121185
#[13] 1392121185 1392121186 1392121186 1392121187 1392121187 1392121188
#[19] 1392121188 1392121189 1392121189 1392121190 1392121190 1392121191
#[25] 1392121191 1392121192 1392121192 1392121193 1392121193 1392121194
#[31] 1392121194 1392121195 1392121195 1392121196 1392121196 1392121197
#[37] 1392121197 1392121198 1392121198 1392121199 1392121199 1392121200
#[43] 1392121200 1392121201 1392121201 1392121202 1392121202 1392121203
#[49] 1392121203 1392121204 1392121204

then the output is numeric . 然后输出为numeric 。

Because R uses 32 bit integer types and POSIXt integers are large numbers, adding two POSIXt integers results in an overflow that R can't handle and returns NA . 因为R使用32位整数类型，并且POSIXt整数是大数，所以将两个POSIXt整数相加会导致R无法处理并返回NA的溢出。 When pretty returns numeric, this is not a problem. 当pretty返回数字时，这不是问题。

See also: What is integer overflow in R and how can it happen? 另请参阅： R中的整数溢出是什么以及如何发生？

In practice, all this means is that, if you print out the hist structure returned, all of your mids values will be NA but I don't think it actually affects the plotting of the histogram. 实际上，所有这些意味着，如果您打印出返回的hist结构，则所有mids值都将为NA但我认为它实际上不会影响直方图的绘制。 Thus it is only a warning. 因此，这只是一个警告。

EDIT: pretty internally uses seq.int 编辑： pretty内部使用seq.int

Answer 2

In my environement, it does not generate any errors. 在我的环境中，它不会产生任何错误。

dataset <- read.csv("Time.txt", header = F)
time <- strptime(dataset[,1], format = "%a %b %d %H:%M:%S %z %Y")
hist(as.numeric(time), breaks = 100)

Perhaps if you just convert time into numeric as above, error will disappear. 也许如果只是将时间转换成上述数字，则错误将消失。 Then, it is straightforward to change the x-axis of the histogram. 然后，很容易更改直方图的x轴。

EDIT : The ggplot2 should not face this issue and is much simpler and modern : 编辑： ggplot2不应该面对这个问题，它更加简单和现代：

ggplot(dataset) + geom_histogram(aes(x = V1), stat = "count", bins = 100)

Where V1 is the default name of the unique column of dataset created by read.csv() . 其中V1是read.csv()创建的dataset的唯一列的默认名称。

在R中生成直方图时出错

问题描述

2 个解决方案

解决方案1
4 2017-10-16 10:19:12

解决方案2
0 2017-10-16 09:46:33

在R中生成直方图时出错

问题描述

2 个解决方案

解决方案1 4 2017-10-16 10:19:12

解决方案2 0 2017-10-16 09:46:33

解决方案1
4 2017-10-16 10:19:12

解决方案2
0 2017-10-16 09:46:33