简体   繁体   English

使用 na.rm = TRUE 时总结返回 -inf

[英]summarise returning -inf when using na.rm = TRUE

I recently built a simple R script to summarize three different data frames.我最近构建了一个简单的 R 脚本来总结三个不同的数据框。 Since updating to the newest version of R and R Studio, I am running into an output I haven't seen before when using the summarize function in dplyr for only one of the data frames (the other two are fine).自从更新到最新版本的 R 和 R Studio 后,我遇到了以前从未见过的输出,在 dplyr 中仅对其中一个数据帧使用汇总函数(其他两个都很好)。 I also receive a series of warnings that are unfamiliar to me.我还收到了一系列我不熟悉的警告。 Please note that prior to updating, I ran the script exactly as written with no issues for any of the data frames.请注意,在更新之前,我完全按照编写的方式运行脚本,任何数据框都没有问题。

The data frame with the problem is called VO2 and its is set up as follows:有问题的数据框称为VO2,其设置如下:

Name        Sex       VO2
AthleteA    M         50
AthleteA    M         52
AthleteA    M         NA
AthleteB    M         49
AthleteB    M         56
AthleteB    M         47 
AthleteC    M         42
AthleteC    M         NA
AthleteC    M         41 
AthleteD    M         NA
AthleteD    M         NA
AthleteD    M         NA 

The code I run is:我运行的代码是:

Test.Summary.VO2 = VO2 %>% group_by(Name, Sex) %>% 
summarise(Best.Score = max(VO2, na.rm=TRUE))

This code generates the following summary:此代码生成以下摘要:

Name       Sex     Best.Score
AthleteA    M        52
AthleteB    M        56
AthleteC    M        42
AthleteD    M        -Inf

The -Inf value is completely new in the output. -Inf 值在输出中是全新的。 I cannot figure out why it is appearing now for cases where there were only NAs.我无法弄清楚为什么它现在出现在只有 NA 的情况下。

As mentioned above, I have the exact same layout for a second data frame and run the same type of summary.如上所述,我对第二个数据框有完全相同的布局并运行相同类型的摘要。 Here everything works fine.这里一切正常。 When I summarize with na.rm=TRUE, it removes the NA cases without replacing NA cases with an -Inf value.当我用 na.rm=TRUE 进行总结时,它会删除 NA 案例而不用 -Inf 值替换 NA 案例。

Where this gets a bit more unusual is that when I view the data frame using:更不寻常的是,当我使用以下方法查看数据框时:

View(Test.Summary.VO2)

I receive the following series of warning messages:我收到以下一系列警告消息:

There were 38 warnings (use warnings() to see them)
warnings()
Warning messages:
1: Unknown or uninitialised column: 'Quad'.
2: Unknown or uninitialised column: 'Quad'.
3: Unknown or uninitialised column: 'Quad'.
4: Unknown or uninitialised column: 'Quad'.

Later on in the script I generate a new variable called "Quad".稍后在脚本中,我生成了一个名为“Quad”的新变量。 But the warning above appears even after I clear the environment, and restart R Studio.但是即使在我清除环境并重新启动 R Studio 后,上述警告也会出现。 I have even tried renaming the .csv file and importing using a different dataframe name.我什至尝试重命名 .csv 文件并使用不同的数据框名称导入。 It's almost as if the column 'Quad' that is generated later in the script is hanging around somewhere in the environment.这几乎就像脚本中稍后生成的“Quad”列在环境中的某个地方徘徊。

I am really at a loss as to what might be happening here.我真的不知道这里可能会发生什么。

I hope one of the R experts on Stack can provide me with an idea on how to remedy this issue.我希望 Stack 上的一位 R 专家可以为我提供有关如何解决此问题的想法。

Thanks for you consideration.谢谢你的考虑。

See ?max :?max

The minimum and maximum of a numeric empty set are +Inf and -Inf (in this order!) which ensures transitivity, eg, min(x1, min(x2)) == min(x1, x2) .数字空集的最小值和最大值是 +Inf 和 -Inf(按此顺序!),它们确保传递性,例如min(x1, min(x2)) == min(x1, x2) For numeric x max(x) == -Inf and min(x) == +Inf whenever length(x) == 0 (after removing missing values if requested).对于数字x max(x) == -Infmin(x) == +Inf只要length(x) == 0 (如果需要,在删除缺失值之后)。 However, pmax and pmin return NA if all the parallel elements are NA even for na.rm = TRUE .但是,如果所有并行元素都是NA即使对于na.rm = TRUEpmaxpmin也会返回NA

You don't have any non-NA values for group D, so max returns the value for an empty set.组 D 没有任何非 NA 值,因此max返回空集的值。

Late to the party, but a solution would be to return NA instead of Inf when there is no value to maximize.迟到了,但是当没有要最大化的值时,解决方案是返回 NA 而不是 Inf。 This could be done with the hablar package's s function.这可以通过 hablar 包的 s 函数来完成。

library(dplyr)
library(hablar)

VO2 %>% 
  group_by(Name, Sex) %>% 
  summarise(Best.Score = max(s(VO2)))

which gives you:这给了你:

  Name     Sex   Best.Score
  <chr>    <chr>      <int>
1 AthleteA M             52
2 AthleteB M             56
3 AthleteC M             42
4 AthleteD M             NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用na.rm = TRUE时会删除NaN - NaN is removed when using na.rm=TRUE 如何在使用Dplyr的Group_by和Summarise_at时使用na.rm = TRUE和n() - How to Use na.rm=TRUE with n() While Using Dplyr's Group_by and Summarise_at 在创建的 Function 中使用 Na.RM = TRUE - Using Na.RM = TRUE in a created Function dplyr summarise_each 与 na.rm - dplyr summarise_each with na.rm 为什么在使用替换功能时收到此消息? 在mean.default(x,na.rm = TRUE)中:参数不是数字或逻辑:返回NA - Why I receive this message when using a replacement function? In mean.default(x, na.rm = TRUE) : argument is not numeric or logical: returning NA 在计算中位数时如何将na.rm = TRUE传递给sapply? - How to pass na.rm=TRUE to sapply when calculating median? 在函数中使用na.rm = TRUE选项获取摘要统计信息-summary() - Using the na.rm = TRUE option in the function for summary stats - summary() 在 R 代码中使用 na.rm=TRUE 进行汇总时出现问题 - Problem using na.rm=TRUE in summarize in R code 在dplyr中将na.rm = TRUE合并到Summarise_Each中以实现多个功能 - Incorporating na.rm=TRUE into Summarise_Each for Multiple Functions in dplyr 在栅格数据包中使用calc()时,如果na.rm = TRUE,则标准偏差函数将引发错误 - standard deviation function throws error when na.rm=TRUE while using calc() in raster package
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM