简体   繁体   中英

R - Export Numsummary to csv

I'm having some trouble exporting some information from R. The information is a numsummary, I can get most of the summary into a csv using the suggestion linked here which is basically just

write.csv(numsummary$table)

but every time I use this the last column gets cut off from the csv output.

I haven't been able to find a way to get the last column included in csv output, would anyone know how to do this or be able to point me to a resource I could check to find out how to do this?

Please let me know if there's any more information I could provide that would be helpful, and thanks in advance for your help!

edit: complete R-script of an example where the last column - in this case the column headed 'n' - is cut off. Using csv.write(input$table) seems to leave the last column out on any type of output I use, not just numerical summaries.

#start toothGrowth csv generation
#dataset available at https://vincentarelbundock.github.io/Rdatasets/csv/datasets/ToothGrowth.csv

toothGrowth <- read.table("ToothGrowth.csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
numSumTooth <- numSummary(toothGrowth[,c("dose", "len", "X")], statistics=c("mean", "sd", "IQR", "quantiles"), quantiles=c(0,.25,.5,.75,1))
str(toothGrowth)
numSumTooth
write.csv(numSumTooth$table, file="numSumTooth.csv")

#end toothGrowth csv generation

The output I generate using the script above is linked here on pastebin sumSumTooth

The reason why "n" was missing is because that value is kept as numSummaryObj$n , while the other exploratory values are kept as numSummaryObj$table .

Putting it back requires a simple cbind or data.frame command:

file <- "https://vincentarelbundock.github.io/Rdatasets/csv/datasets/ToothGrowth.csv"
toothGrowth  <- read.table(file, header=T, sep=",", row.names=1, na.strings="NA", dec=".", strip.white=TRUE)

numSumTooth <- RcmdrMisc::numSummary(toothGrowth[, c("len", "dose")])

nST <- data.frame(numSumTooth$table, numSumTooth$n)
names(nST) <- c(colnames(numSumTooth$table), "n")

write.csv(nST, "numSumTooth.csv")

==

EDIT:

I would personally invest sometime in data-handling with packages like dplyr and tidyr , as they give you a lot of mileage and flexibility in future. For instance, in order to generate the same numSummary in a data.frame, you can run the following:

toothGrowth %>% 
  select(-supp) %>% 
  gather(var, val) %>% #convert the wide data frame into the long-form, with var = dose and len
  group_by(var) %>% 
  summarise(mean = mean(val), sd = sd(val),
            IQR = IQR(val),
            `0%`= min(val),
            `25%` = quantile(val, 0.25),
            `50%` = median(val),
            `75%` = quantile(val, .75),
            `100%` = max(val),
            n = n())


# A tibble: 2 × 10
    var      mean        sd   IQR  `0%`  `25%` `50%`  `75%` `100%`     n
  <chr>     <dbl>     <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl>  <dbl> <int>
1  dose  1.166667 0.6288722   1.5   0.5  0.500  1.00  2.000    2.0    60
2   len 18.813333 7.6493152  12.2   4.2 13.075 19.25 25.275   33.9    60  

The added flexibility in this approach is that you can choose to find mean for each group (like supp in this case):

toothGrowth %>% 
#  select(-supp) %>% 
  gather(var, val, -supp) %>% 
  group_by(supp, var) %>% 
  summarise(mean = mean(val), sd = sd(val),
            IQR = IQR(val),
            `0%`= min(val),
            `25%` = quantile(val, 0.25),
            `50%` = median(val),
            `75%` = quantile(val, .75),
            `100%` = max(val),
            n = n())


Source: local data frame [4 x 11]
Groups: supp [?]

    supp   var      mean        sd   IQR  `0%`  `25%` `50%`  `75%` `100%`     n
   <fctr> <chr>     <dbl>     <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl>  <dbl> <int>
 1     OJ  dose  1.166667 0.6342703   1.5   0.5  0.500   1.0  2.000    2.0    30
 2     OJ   len 20.663333 6.6055610  10.2   8.2 15.525  22.7 25.725   30.9    30
 3     VC  dose  1.166667 0.6342703   1.5   0.5  0.500   1.0  2.000    2.0    30
 4     VC   len 16.963333 8.2660287  11.9   4.2 11.200  16.5 23.100   33.9    30

==

Another alternative (if you feel that writing the long summarise syntax repeatedly is a chore) is to create a function, eg:

checkVar <- function(varname, data){
  val <- data[, varname]
  tmp <- data.frame(mean = mean(val), 
                    sd = sd(val),
                    IQR = IQR(val),
                    `0%`= min(val),
                    `25%` = quantile(val, 0.25),
                    `50%` = median(val),
                    `75%` = quantile(val, .75),
                    `100%` = max(val),
                    n = length(val)) 
  names(tmp) <- c("mean", "sd", "IQR", "`0%`", "`25%`", "`50%`", "`75%`", "`100%`", "n")
  rownames(tmp) <- varname
  return(tmp)
} 

Executing the custom function would give you summary statistics:

checkVar("dose", ToothGrowth)


         mean        sd IQR `0%` `25%` `50%` `75%` `100%`  n
dose 1.166667 0.6288722 1.5  0.5   0.5     1     2      2 60

And putting them into a single data.frame involves an apply function, eg with lapply :

do.call(rbind, lapply(c("dose", "len"), checkVar, data=ToothGrowth))


          mean        sd  IQR `0%`  `25%` `50%`  `75%` `100%`  n
dose  1.166667 0.6288722  1.5  0.5  0.500  1.00  2.000    2.0 60
len  18.813333 7.6493152 12.2  4.2 13.075 19.25 25.275   33.9 60

I had the same problem, elaborating over the previous answer

I had a summary

str(resumenDatos)

List of 4
 $ type      : num 4
 $ table     : num [1:514, 1:8] 3.7544 4.5779 4.135 -1.0582 -0.0789 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ Group    : chr [1:514] "2020_02_28_00" "2020_02_28_01" "2020_02_28_02" "2020_02_28_03" ...
  .. ..$ Statistic: chr [1:8] "mean" "0%" "25%" "50%" ...
 $ statistics: chr [1:2] "mean" "quantiles"
 $ n         : num [1, 1:514] 2948 1784 1756 1306 1064 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr "whatIWantToMeasure"
  .. ..$ : chr [1:514] "2020_02_28_00" "2020_02_28_01" "2020_02_28_02" "2020_02_28_03" ...
 - attr(*, "class")= chr "numSummary"

And I created the dateFrame as follows:

> resumenDatosDF <- data.frame(resumenDatos$table,t(resumenDatos$n))

> names(resumenDatosDF) <- c(colnames(resumenDatos$table), "n")

> str(resumenDatosDF)
'data.frame':   514 obs. of  9 variables:
 $ mean: num  3.7544 4.5779 4.135 -1.0582 -0.0789 ...
 $ 0%  : num  -986 -997 -995 -996 -986 -997 -996 -997 -996 -997 ...
 $ 25% : num  3 3 4 13 17 15 13 3 3 3 ...
 $ 50% : num  14 21 17 24 26 26 25 15 15 13 ...
 $ 75% : num  24 30.2 27 28 28 ...
 $ 90% : num  30 37 31 32 31.7 ...
 $ 99% : num  38 49 38 40 39 ...
 $ 100%: num  250 416 105 57 214 ...
 $ n   : num  2948 1784 1756 1306 1064 ...

> head(resumenDatosDF,10)
                     mean   0% 25% 50%   75%  90%   99% 100%    n
2020_02_28_00  3.75440977 -986   3  14 24.00 30.0 38.00  250 2948
2020_02_28_01  4.57791480 -997   3  21 30.25 37.0 49.00  416 1784
2020_02_28_02  4.13496583 -995   4  17 27.00 31.0 38.00  105 1756
2020_02_28_03 -1.05819296 -996  13  24 28.00 32.0 40.00   57 1306
2020_02_28_04 -0.07894737 -986  17  26 28.00 31.7 39.00  214 1064
2020_02_28_05  3.26701571 -997  15  26 28.00 32.0 39.55   87 1146
2020_02_28_06  4.92619392 -996  13  25 28.00 31.0 39.00   59 1382
2020_02_28_07  1.13968101 -997   3  15 27.00 30.0 40.32  240 2069
2020_02_28_08 -1.99729973 -996   3  15 27.00 31.0 40.00  376 2222
2020_02_28_09  0.59954083 -997   3  13 23.00 33.0 41.52 1086 3049

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM