简体   繁体   中英

R table() into data.frame changes rownames

I am using table() to create a frequency table for several values from the same data frame.

### make up some data
orig.df<-data.frame(
  decade=c("1910s","1910s","1920s","1930s"),
  size=c("low (<1)", "high (>5000)", "medium (11-100)","low (<1)"),
  plant=c("orange","apple","orange","apple")
)
data.frame(rbind(
  table(orig.df$decade),
  table(orig.df$size, orig.df$decade),
  table(orig.df$plant, orig.df$decade)
  
))

                X1910s X1920s X1930s
                     2      1      1
high (>5000)         1      0      0
low (<1)             1      0      1
medium (11-100)      0      1      0
apple                1      0      1
orange               1      1      0

the Xs in the colnames can always be removed with gsub()

Problem is, that my data contains NAs and I am trying to highlight these explicitly. Somehow table(..., useNA="always") creates a problem when rbinding multiple tables to a data frame.

data.frame(rbind(
  table(orig.df$decade, useNA="always"),
  table(orig.df$size, orig.df$decade, useNA="always"),
  table(orig.df$plant, orig.df$decade, useNA="always")
  
))

                X1910s X1920s X1930s NA.
X                    2      1      1   0
high...5000.         1      0      0   0
low...1.             1      0      1   0
medium..11.100.      0      1      0   0
NA.                  0      0      0   0
apple                1      0      1   0
orange               1      1      0   0
NA..1                0      0      0   0

I was thinking about using gsub() like for the colnames, but if you look closely "." can mean parenthesis, space and "-" so no luck with that.

desired output (basically the same table just with proper names, my real problem has several 100 rows, so I can't manually go through them all)

                1910s 1920s 1930s
                     2      1      1
high (>5000)         1      0      0
low (<1)             1      0      1
medium (11-100)      0      1      0
NA                   0      0      0
apple                1      0      1
orange               1      1      0
NA                   0      0      0

What is the solution? There has to be some simple setting to change?

Using as.data.frame with make.names = FALSE you see why your names are actually changed:

as.data.frame(rbind(
  table(orig.df$decade, useNA="always"),
  table(orig.df$size, orig.df$decade, useNA="always"),
  table(orig.df$plant, orig.df$decade, useNA="always")
), make.names = FALSE)
#> Warning: non-unique values when setting 'row.names':
#> Error in `.rowNamesDF<-`(`*tmp*`, make.names = make.names, value = row.names): duplicate 'row.names' are not allowed

Your row names are duplicated and hence R is trying to fix that. If you want to use a different way of fixing it, you could write a quick function which deals with the rownames on the fly:

table2df <- function(x) {
  data.frame(x, row.names = make.unique(paste(rownames(x))))
}

table2df(rbind(
  table(orig.df$decade, useNA="always"),
  table(orig.df$size, orig.df$decade, useNA="always"),
  table(orig.df$plant, orig.df$decade, useNA="always")
))
#>                 X1910s X1920s X1930s NA.
#>                      2      1      1   0
#> high (>5000)         1      0      0   0
#> low (<1)             1      0      1   0
#> medium (11-100)      0      1      0   0
#> NA                   0      0      0   0
#> apple                1      0      1   0
#> orange               1      1      0   0
#> NA.1                 0      0      0   0

The main problem is that a dataframe can't have duplicated rownames or empty colnames. If you dont mind working with matrix, just get rid of the data.frame() :

M <- rbind(
  table(orig.df$decade, useNA="always"),
  table(orig.df$size, orig.df$decade, useNA="always"),
  table(orig.df$plant, orig.df$decade, useNA="always")
  )

Output

                1910s 1920s 1930s <NA>
                    2     1     1    0
high (>5000)        1     0     0    0
low (<1)            1     0     1    0
medium (11-100)     0     1     0    0
<NA>                0     0     0    0
apple               1     0     1    0
orange              1     1     0    0
<NA>                0     0     0    0

To export it to a Excel sheet, just do:

df <- data.frame(cbind(rownames = rownames(M), M), row.names = NULL)
xlsx::write.xlsx(df, "test.xlsx", row.names = F)

ftable pretty much gets you where you need to be:

data = rbind(
  table(orig.df$decade, useNA="always"),
  table(orig.df$size, orig.df$decade, useNA="always"),
  table(orig.df$plant, orig.df$decade, useNA="always"))


data = ftable(data)

So you get following output:

                 1910s 1920s 1930s NA

                     2     1     1  0
high (>5000)         1     0     0  0
low (<1)             1     0     1  0
medium (11-100)      0     1     0  0
NA                   0     0     0  0
apple                1     0     1  0
orange               1     1     0  0
NA                   0     0     0  0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM