简体   繁体   English

如何仅舍入数据框中的数值?

[英]How to round only the numeric values in a dataframe?

I'm trying to round all numerical values in my data frame.我正在尝试舍入数据框中的所有数值。

The issue is that my data frame also includes strings, and not just in any particular column or row.问题是我的数据框还包含字符串,而不仅仅是在任何特定的列或行中。 I want to avoid having to code a loop where I go through each individual row-column cell pair and check if the value is numerical before rounding.我想避免编写一个循环,在该循环中遍历每个单独的行列单元格对并在舍入前检查该值是否为数字。

Is there a function (or a combination of functions) that will let me achieve this?是否有一个功能(或功能组合)可以让我实现这一目标?

So far I've tried round_df() and various lapply() and apply() combinations with lambdas.到目前为止,我已经尝试了round_df()和各种lapply()apply()与 lambda 的组合。 However, I've only gotten where it rounds based on the first value in the column (ie if the first value is numerical, it treats the entire column as numerical and only rounds it).但是,我只得到了基于列中的第一个值的舍入位置(即,如果第一个值是数字,则它将整个列视为数字,并且只对其进行舍入)。

I've run into problems then where the first value is a string and so the entire column goes un-rounded or vice-versa, in which my code errors because it tries to round a string.我遇到了问题,然后第一个值是字符串,因此整个列都未舍入,反之亦然,其中我的代码错误,因为它试图舍入字符串。

My function is:我的功能是:

 library(readxl) library(knitr) library(gplots) library(doBy) library(dplyr) library(plyr) library(printr) library(xtable) library(gmodels) library(survival) library(pander) library(psych) library(questionr) library(DT) library(data.table) library(expss) library(xtable) options(xtable.floating = FALSE) options(xtable.timestamp = "") library(kableExtra) library(magrittr) library(Hmisc) library(forestmangr) library(summarytools) library(gmodels) library(stats) summaryTable <- function(y, bygroup, digit, title="", caption_heading="", caption="", freq.tab, y.label="", y.names="", boxplot) { if (freq.tab) { m = multi.fun(y) } else if (!missing(bygroup)) { m = data.frame(y.label = "") m = merge(m, data.frame(describeBy(y, bygroup, mat = T))) m = select(m, y.label, n, mean, sd, min, median, max) } else { m = data.frame(y.label = "") m = merge(m, data.frame(sumconti(y))) } if (!freq.tab) { m$y.label = y.names } m = round_df(m, digit, "signif") if (freq.tab) { colnames(m) = c(y.label, "Frequency", "%") } else if (missing(freq.tab) | !freq.tab) { colnames(m) = c(y.label, "n", "Mean", "Std", "Min", "Median", "Max") } if (!missing(boxplot)) { if (boxplot) { attach(m) layout(matrix(c(1, 1, 2, 1)), 2, 1) kable(m, align = "c", "latex", booktabs = T, caption=figTitle(x, title, y.label)) %>% kable_styling(position = 'center', latex_options = c("striped", "repeat_header", "hold_position")) %>% footnote(general = caption, general_title = caption_heading, footnote_as_chunk = T, title_format = c("italic", "underline"), threeparttable = T) boxplot(y ~ bygroup, main = figTitle(y, title, y.label), names = y.names, ylab = title, xlab = y.label, col = c("red", "blue", "orange", "pink", "green", "purple", "grey", "yellow"), border = "black", horizontal = F, varwidth = T) } } kable(m, align = "c", "latex", booktabs = T, caption = figTitle(x, title, y.label)) %>% kable_styling(position = 'center', latex_options = c("striped", "repeat_header", "hold_position")) %>% footnote(general = caption, general_title = caption_heading, footnote_as_chunk = T, title_format = c("italic", "underline"), threeparttable = T) } figTitle = function(x, title, y.label) { if (y.label != "") { paste("Summary of", title, "by", y.label) } else if (title != "") { paste("Summary of", title) } else { paste("") } }

The question did not include the data so we don't really know what the problem is precisely (please always provide a complete minimal reproducible example) but we have divided the answer into two sections based on two possibilities for what the problem might be and have provided test data for each.该问题不包括数据,所以我们并不真正知道问题究竟是什么(请始终提供一个完整的最小可重复示例),但我们根据问题可能是什么和有什么的两种可能性将答案分为两部分为每个提供了测试数据。 No packages are used.不使用任何包。

Round numeric only仅整数

If the problem is that you have a mix of numeric and character and you only want to round the numeric then here are a few ways.如果问题是您混合了数字和字符,而您只想对数字进行四舍五入,那么这里有几种方法。

1) Compute which columns are numeric giving the logical vector ok and then round those. 1)计算,其列是数字赋予逻辑矢量ok ,然后轮那些。 We use the built-in Puromycin dataset as an example.我们以内置的嘌呤霉素数据集为例。 No packages are used.不使用任何包。

ok <- sapply(Puromycin, is.numeric)
replace(Puromycin, ok, round(Puromycin[ok], 1))

giving:给予:

   conc rate     state
1   0.0   76   treated
2   0.0   47   treated
3   0.1   97   treated
4   0.1  107   treated
5   0.1  123   treated
6   0.1  139   treated
...etc...

1a) The last line can also be written like this if you don't mind overwriting the input. 1a)如果您不介意覆盖输入,最后一行也可以这样写。

Puromycin[ok] <- round(Puromycin[ok], 1)

2) Another approach is to perform the condition in the lapply 2)另一种方法是在lapply执行条件

Round <- function(x, k) if (is.numeric(x)) round(x, k) else x
replace(Puromycin, TRUE, lapply(Puromycin, Round, 1))

2a) or with overwriting: 2a)或覆盖:

Puromycin[] <- lapply(Puromycin, Round, 1)

Round everything圆的一切

If the problem is that all the columns are supposed to be numeric but some are actually character, although they represent numbers, then.using the indicated data frame as an example, apply type.convert .如果问题是所有列都应该是数字,但有些列实际上是字符,尽管它们代表数字,那么.以指示的数据框为例,应用type.convert

# create test data having numeric, character and factor columns but
# all intended to represent numbers
DF <- structure(list(Time = c("0.1", "0.12", "0.3", "0.14", "0.5", 
"0.7"), demand = c(0.83, 1.03, 1.9, 1.6, 1.56, 1.98), Time2 = structure(c(1L, 
2L, 4L, 3L, 5L, 6L), .Label = c("0.1", "0.12", "0.14", "0.3", 
"0.5", "0.7"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))

round(replace(DF, TRUE, lapply(DF, type.convert)), 1)

To add one last possibility to the options above:为上述选项添加最后一种可能性:

Suppose you have character columns which contain also (not only) numbers, but in string format.假设您有包含(不仅)数字而且包含字符串格式的字符列。 Then the following approach might help.那么以下方法可能会有所帮助。

library(dplyr)
library(purrr)

# I use the data from above's answer with an additional mixed column
DF <- structure(
  list(
    Time = c("0.1", "0.12", "0.3", "0.14", "0.5",
             "0.7"),
    demand = c(0.83, 1.03, 1.9, 1.6, 1.56, 1.98),
    Mix = c("3.38", "4.403", "a", "5.34", "c", "9.32"),
    Time2 = structure(
      c(1L,
        2L, 4L, 3L, 5L, 6L),
      .Label = c("0.1", "0.12", "0.14", "0.3",
                 "0.5", "0.7"),
      class = "factor"
    )
  ),
  class = "data.frame",
  row.names = c(NA,-6L)
)

TBL <- as_tibble(DF)

# This are the functions we use
round_string_number <- function(x) {
  ifelse(!is.na(as.double(x)),
         as.character(round(as.double(x), digit = 1)),
         x)
}

round_string_factor <- compose(round_string_number, as.character)

# Here the recode is happening
TBL %>%
  mutate_if(is.numeric, ~ round(., digit = 1)) %>% 
  mutate_if(is.factor, round_string_factor) %>% 
  mutate_if(~!is.numeric(.), round_string_number)

This will turn this data这将把这些数据

  Time  demand Mix   Time2
  <chr>  <dbl> <chr> <fct>
1 0.1     0.83 3.38  0.1  
2 0.12    1.03 4.403 0.12 
3 0.3     1.9  a     0.3  
4 0.14    1.6  5.34  0.14 
5 0.5     1.56 c     0.5  
6 0.7     1.98 9.32  0.7  

Into this:进入这个:

  Time  demand Mix   Time2
  <chr>  <dbl> <chr> <chr>
1 0.1      0.8 3.4   0.1  
2 0.1      1   4.4   0.1  
3 0.3      1.9 a     0.3  
4 0.1      1.6 5.3   0.1  
5 0.5      1.6 c     0.5  
6 0.7      2   9.3   0.7 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM