简体   繁体   中英

R / dyplr: Transforming two rows into two columns

I have a dataframe in R that looks like this:

Word    Base    Number  Type
-       -       -       -
shoe    shoe    4834    singular
shoes   shoe    49955   plural
toy     toy     75465   singular
toys    toy     23556   plural
key     key     39485   singular
keys    key     6546    plural
jazz    jazz    58765   plural

I would like to transform it so that it looks like this:

Word_Sg Word_Pl Base    Num_Singular    Num_Plural
--      --      --      --              --
shoe    shoes   shoe    4834            49955
toy     toys    toy     75465           23556
key     keys    key     39485           6546
NA      jazz    jazz    NA              58765

So rather than having two rows for the values for singular & plural, I want to have two colums, one with the number for singular, and one with the number for plural.

I've tried a few things using dplyr::summarize , but so far, without any success. Here is the code that I've come up with so far:

dataframe1 <- dataframe %>% 
      mutate(Num_Singular = case_when(Type == "singular" ~ Number)) %>%
      mutate(Num_Plural = case_when(Type == "plural" ~ Number)) %>%
      dplyr::select(Word, Base, Num_Singular, Num_Plural) %>%
      group_by(Base) %>%
      dplyr::summarize(Num_Singular = paste(na.omit(Num_Singular)),
                       Num_Plural = paste(na.omit(Num_Plural))

However, it gives me this error:

Error in summarise_impl(.data, dots) : 
  Column `Num_Singular` must be length 1 (a summary value), not 2)

I think the problem might be that there are rows that don't necessarily have singular AND plural, but only either (eg "jazz"). Most rows have both though.

So how can I do this in R or dplyr?

If you first look at the first few columns::

select(dat, Base, Word, Type)[1:2,]
#   Base  Word     Type
# 1 shoe  shoe singular
# 2 shoe shoes   plural

From here, consider it just spreading it into a singular/plural columns, effectively going from "tall" to "wide". (It would be much more apparent if there were more than two categories in Type .)

select(dat, Base, Word, Type) %>%
    spread(Type, Word) %>%
    rename(Word_Pl=plural, Word_Sg=singular)
#   Base Word_Pl Word_Sg
# 1 jazz    jazz    <NA>
# 2  key    keys     key
# 3 shoe   shoes    shoe
# 4  toy    toys     toy

You can easily repeat this for Number as well. From there, it's just a matter of merging/joining them based on the key column, Base :

full_join(
  select(dat, Base, Word, Type) %>%
    spread(Type, Word) %>%
    rename(Word_Pl=plural, Word_Sg=singular),
  select(dat, Base, Number, Type) %>%
    spread(Type, Number) %>%
    rename(Num_Pl=plural, Num_Sg=singular),
  by = "Base"
)
#   Base Word_Pl Word_Sg Num_Pl Num_Sg
# 1 jazz    jazz    <NA>  58765     NA
# 2  key    keys     key   6546  39485
# 3 shoe   shoes    shoe  49955   4834
# 4  toy    toys     toy  23556  75465

Consumable data:

library(dplyr)
library(tidyr)
dat <- read.table(text='Word    Base    Number  Type
shoe    shoe    4834    singular
shoes   shoe    49955   plural
toy     toy     75465   singular
toys    toy     23556   plural
key     key     39485   singular
keys    key     6546    plural
jazz    jazz    58765   plural', header=TRUE, stringsAsFactors=FALSE)

tidyr 's new pivot_wider() function makes this simple...

library(dplyr)
library(tidyr)

dat <- read.table(header = T, stringsAsFactors = F, text='
Word    Base    Number  Type
shoe    shoe    4834    singular
shoes   shoe    49955   plural
toy     toy     75465   singular
toys    toy     23556   plural
key     key     39485   singular
keys    key     6546    plural
jazz    jazz    58765   plural')

dat %>% 
  pivot_wider(id_cols = Base, names_from = Type, values_from = c(Word, Number))

# # A tibble: 4 x 5
#   Base  Word_singular Word_plural Number_singular Number_plural
#   <chr> <chr>         <chr>                 <int>         <int>
# 1 shoe  shoe          shoes                  4834         49955
# 2 toy   toy           toys                  75465         23556
# 3 key   key           keys                  39485          6546
# 4 jazz  NA            jazz                     NA         58765

The core idea is to identify each data point by it's type and whether it's a word or a number... then it's easy to spread to the format you want. (I won't bother renaming the variables or ordering them specifically to match your expected output because that's easy to do and not part of the problem here)

library(dplyr)
library(tidyr)

dat <- read.table(header = T, stringsAsFactors = F, text='
Word    Base    Number  Type
shoe    shoe    4834    singular
shoes   shoe    49955   plural
toy     toy     75465   singular
toys    toy     23556   plural
key     key     39485   singular
keys    key     6546    plural
jazz    jazz    58765   plural')

dat %>% 
  gather(variable, value, Word, Number) %>%
  unite(Type, variable, Type) %>%
  spread(Type, value, convert = T) %>% 
  as_tibble()

# # A tibble: 4 x 5
#   Base  Number_plural Number_singular Word_plural Word_singular
#   <chr>         <int>           <int> <chr>       <chr>        
# 1 jazz          58765              NA jazz        NA           
# 2 key            6546           39485 keys        key          
# 3 shoe          49955            4834 shoes       shoe         
# 4 toy           23556           75465 toys        toy  

You could join the plural and singular subsets of your data by Base , then remove the Type columns and reorder the others...

full_join(filter(dat, Type == "plural"), 
          filter(dat, Type == "singular"),
          by = "Base", 
          suffix = c("_Pl", "_Sg")) %>% 
  select(Word_Sg, Word_Pl, Base, Number_Sg, Number_Pl)

#   Word_Sg Word_Pl Base Number_Sg Number_Pl
# 1    shoe   shoes shoe      4834     49955
# 2     toy    toys  toy     75465     23556
# 3     key    keys  key     39485      6546
# 4    <NA>    jazz jazz        NA     58765

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM