I am using table()
to create a frequency table for several values from the same data frame.
### make up some data
orig.df<-data.frame(
decade=c("1910s","1910s","1920s","1930s"),
size=c("low (<1)", "high (>5000)", "medium (11-100)","low (<1)"),
plant=c("orange","apple","orange","apple")
)
data.frame(rbind(
table(orig.df$decade),
table(orig.df$size, orig.df$decade),
table(orig.df$plant, orig.df$decade)
))
X1910s X1920s X1930s
2 1 1
high (>5000) 1 0 0
low (<1) 1 0 1
medium (11-100) 0 1 0
apple 1 0 1
orange 1 1 0
the Xs in the colnames can always be removed with gsub()
Problem is, that my data contains NAs and I am trying to highlight these explicitly. Somehow table(..., useNA="always")
creates a problem when rbinding multiple tables to a data frame.
data.frame(rbind(
table(orig.df$decade, useNA="always"),
table(orig.df$size, orig.df$decade, useNA="always"),
table(orig.df$plant, orig.df$decade, useNA="always")
))
X1910s X1920s X1930s NA.
X 2 1 1 0
high...5000. 1 0 0 0
low...1. 1 0 1 0
medium..11.100. 0 1 0 0
NA. 0 0 0 0
apple 1 0 1 0
orange 1 1 0 0
NA..1 0 0 0 0
I was thinking about using gsub()
like for the colnames, but if you look closely "." can mean parenthesis, space and "-" so no luck with that.
desired output (basically the same table just with proper names, my real problem has several 100 rows, so I can't manually go through them all)
1910s 1920s 1930s
2 1 1
high (>5000) 1 0 0
low (<1) 1 0 1
medium (11-100) 0 1 0
NA 0 0 0
apple 1 0 1
orange 1 1 0
NA 0 0 0
What is the solution? There has to be some simple setting to change?
Using as.data.frame
with make.names = FALSE
you see why your names are actually changed:
as.data.frame(rbind(
table(orig.df$decade, useNA="always"),
table(orig.df$size, orig.df$decade, useNA="always"),
table(orig.df$plant, orig.df$decade, useNA="always")
), make.names = FALSE)
#> Warning: non-unique values when setting 'row.names':
#> Error in `.rowNamesDF<-`(`*tmp*`, make.names = make.names, value = row.names): duplicate 'row.names' are not allowed
Your row names are duplicated and hence R is trying to fix that. If you want to use a different way of fixing it, you could write a quick function which deals with the rownames on the fly:
table2df <- function(x) {
data.frame(x, row.names = make.unique(paste(rownames(x))))
}
table2df(rbind(
table(orig.df$decade, useNA="always"),
table(orig.df$size, orig.df$decade, useNA="always"),
table(orig.df$plant, orig.df$decade, useNA="always")
))
#> X1910s X1920s X1930s NA.
#> 2 1 1 0
#> high (>5000) 1 0 0 0
#> low (<1) 1 0 1 0
#> medium (11-100) 0 1 0 0
#> NA 0 0 0 0
#> apple 1 0 1 0
#> orange 1 1 0 0
#> NA.1 0 0 0 0
The main problem is that a dataframe can't have duplicated rownames or empty colnames. If you dont mind working with matrix, just get rid of the data.frame()
:
M <- rbind(
table(orig.df$decade, useNA="always"),
table(orig.df$size, orig.df$decade, useNA="always"),
table(orig.df$plant, orig.df$decade, useNA="always")
)
Output
1910s 1920s 1930s <NA>
2 1 1 0
high (>5000) 1 0 0 0
low (<1) 1 0 1 0
medium (11-100) 0 1 0 0
<NA> 0 0 0 0
apple 1 0 1 0
orange 1 1 0 0
<NA> 0 0 0 0
To export it to a Excel sheet, just do:
df <- data.frame(cbind(rownames = rownames(M), M), row.names = NULL)
xlsx::write.xlsx(df, "test.xlsx", row.names = F)
ftable
pretty much gets you where you need to be:
data = rbind(
table(orig.df$decade, useNA="always"),
table(orig.df$size, orig.df$decade, useNA="always"),
table(orig.df$plant, orig.df$decade, useNA="always"))
data = ftable(data)
So you get following output:
1910s 1920s 1930s NA
2 1 1 0
high (>5000) 1 0 0 0
low (<1) 1 0 1 0
medium (11-100) 0 1 0 0
NA 0 0 0 0
apple 1 0 1 0
orange 1 1 0 0
NA 0 0 0 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.