I'm having some trouble exporting some information from R. The information is a numsummary, I can get most of the summary into a csv using the suggestion linked here which is basically just
write.csv(numsummary$table)
but every time I use this the last column gets cut off from the csv output.
I haven't been able to find a way to get the last column included in csv output, would anyone know how to do this or be able to point me to a resource I could check to find out how to do this?
Please let me know if there's any more information I could provide that would be helpful, and thanks in advance for your help!
edit: complete R-script of an example where the last column - in this case the column headed 'n' - is cut off. Using csv.write(input$table) seems to leave the last column out on any type of output I use, not just numerical summaries.
#start toothGrowth csv generation
#dataset available at https://vincentarelbundock.github.io/Rdatasets/csv/datasets/ToothGrowth.csv
toothGrowth <- read.table("ToothGrowth.csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
numSumTooth <- numSummary(toothGrowth[,c("dose", "len", "X")], statistics=c("mean", "sd", "IQR", "quantiles"), quantiles=c(0,.25,.5,.75,1))
str(toothGrowth)
numSumTooth
write.csv(numSumTooth$table, file="numSumTooth.csv")
#end toothGrowth csv generation
The output I generate using the script above is linked here on pastebin sumSumTooth
The reason why "n" was missing is because that value is kept as numSummaryObj$n
, while the other exploratory values are kept as numSummaryObj$table
.
Putting it back requires a simple cbind
or data.frame
command:
file <- "https://vincentarelbundock.github.io/Rdatasets/csv/datasets/ToothGrowth.csv"
toothGrowth <- read.table(file, header=T, sep=",", row.names=1, na.strings="NA", dec=".", strip.white=TRUE)
numSumTooth <- RcmdrMisc::numSummary(toothGrowth[, c("len", "dose")])
nST <- data.frame(numSumTooth$table, numSumTooth$n)
names(nST) <- c(colnames(numSumTooth$table), "n")
write.csv(nST, "numSumTooth.csv")
==
EDIT:
I would personally invest sometime in data-handling with packages like dplyr
and tidyr
, as they give you a lot of mileage and flexibility in future. For instance, in order to generate the same numSummary in a data.frame, you can run the following:
toothGrowth %>%
select(-supp) %>%
gather(var, val) %>% #convert the wide data frame into the long-form, with var = dose and len
group_by(var) %>%
summarise(mean = mean(val), sd = sd(val),
IQR = IQR(val),
`0%`= min(val),
`25%` = quantile(val, 0.25),
`50%` = median(val),
`75%` = quantile(val, .75),
`100%` = max(val),
n = n())
# A tibble: 2 × 10
var mean sd IQR `0%` `25%` `50%` `75%` `100%` n
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 dose 1.166667 0.6288722 1.5 0.5 0.500 1.00 2.000 2.0 60
2 len 18.813333 7.6493152 12.2 4.2 13.075 19.25 25.275 33.9 60
The added flexibility in this approach is that you can choose to find mean for each group (like supp
in this case):
toothGrowth %>%
# select(-supp) %>%
gather(var, val, -supp) %>%
group_by(supp, var) %>%
summarise(mean = mean(val), sd = sd(val),
IQR = IQR(val),
`0%`= min(val),
`25%` = quantile(val, 0.25),
`50%` = median(val),
`75%` = quantile(val, .75),
`100%` = max(val),
n = n())
Source: local data frame [4 x 11]
Groups: supp [?]
supp var mean sd IQR `0%` `25%` `50%` `75%` `100%` n
<fctr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 OJ dose 1.166667 0.6342703 1.5 0.5 0.500 1.0 2.000 2.0 30
2 OJ len 20.663333 6.6055610 10.2 8.2 15.525 22.7 25.725 30.9 30
3 VC dose 1.166667 0.6342703 1.5 0.5 0.500 1.0 2.000 2.0 30
4 VC len 16.963333 8.2660287 11.9 4.2 11.200 16.5 23.100 33.9 30
==
Another alternative (if you feel that writing the long summarise syntax repeatedly is a chore) is to create a function, eg:
checkVar <- function(varname, data){
val <- data[, varname]
tmp <- data.frame(mean = mean(val),
sd = sd(val),
IQR = IQR(val),
`0%`= min(val),
`25%` = quantile(val, 0.25),
`50%` = median(val),
`75%` = quantile(val, .75),
`100%` = max(val),
n = length(val))
names(tmp) <- c("mean", "sd", "IQR", "`0%`", "`25%`", "`50%`", "`75%`", "`100%`", "n")
rownames(tmp) <- varname
return(tmp)
}
Executing the custom function would give you summary statistics:
checkVar("dose", ToothGrowth)
mean sd IQR `0%` `25%` `50%` `75%` `100%` n
dose 1.166667 0.6288722 1.5 0.5 0.5 1 2 2 60
And putting them into a single data.frame involves an apply function, eg with lapply
:
do.call(rbind, lapply(c("dose", "len"), checkVar, data=ToothGrowth))
mean sd IQR `0%` `25%` `50%` `75%` `100%` n
dose 1.166667 0.6288722 1.5 0.5 0.500 1.00 2.000 2.0 60
len 18.813333 7.6493152 12.2 4.2 13.075 19.25 25.275 33.9 60
I had the same problem, elaborating over the previous answer
I had a summary
str(resumenDatos)
List of 4
$ type : num 4
$ table : num [1:514, 1:8] 3.7544 4.5779 4.135 -1.0582 -0.0789 ...
..- attr(*, "dimnames")=List of 2
.. ..$ Group : chr [1:514] "2020_02_28_00" "2020_02_28_01" "2020_02_28_02" "2020_02_28_03" ...
.. ..$ Statistic: chr [1:8] "mean" "0%" "25%" "50%" ...
$ statistics: chr [1:2] "mean" "quantiles"
$ n : num [1, 1:514] 2948 1784 1756 1306 1064 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr "whatIWantToMeasure"
.. ..$ : chr [1:514] "2020_02_28_00" "2020_02_28_01" "2020_02_28_02" "2020_02_28_03" ...
- attr(*, "class")= chr "numSummary"
And I created the dateFrame as follows:
> resumenDatosDF <- data.frame(resumenDatos$table,t(resumenDatos$n))
> names(resumenDatosDF) <- c(colnames(resumenDatos$table), "n")
> str(resumenDatosDF)
'data.frame': 514 obs. of 9 variables:
$ mean: num 3.7544 4.5779 4.135 -1.0582 -0.0789 ...
$ 0% : num -986 -997 -995 -996 -986 -997 -996 -997 -996 -997 ...
$ 25% : num 3 3 4 13 17 15 13 3 3 3 ...
$ 50% : num 14 21 17 24 26 26 25 15 15 13 ...
$ 75% : num 24 30.2 27 28 28 ...
$ 90% : num 30 37 31 32 31.7 ...
$ 99% : num 38 49 38 40 39 ...
$ 100%: num 250 416 105 57 214 ...
$ n : num 2948 1784 1756 1306 1064 ...
> head(resumenDatosDF,10)
mean 0% 25% 50% 75% 90% 99% 100% n
2020_02_28_00 3.75440977 -986 3 14 24.00 30.0 38.00 250 2948
2020_02_28_01 4.57791480 -997 3 21 30.25 37.0 49.00 416 1784
2020_02_28_02 4.13496583 -995 4 17 27.00 31.0 38.00 105 1756
2020_02_28_03 -1.05819296 -996 13 24 28.00 32.0 40.00 57 1306
2020_02_28_04 -0.07894737 -986 17 26 28.00 31.7 39.00 214 1064
2020_02_28_05 3.26701571 -997 15 26 28.00 32.0 39.55 87 1146
2020_02_28_06 4.92619392 -996 13 25 28.00 31.0 39.00 59 1382
2020_02_28_07 1.13968101 -997 3 15 27.00 30.0 40.32 240 2069
2020_02_28_08 -1.99729973 -996 3 15 27.00 31.0 40.00 376 2222
2020_02_28_09 0.59954083 -997 3 13 23.00 33.0 41.52 1086 3049
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.