[英]R / dyplr: Transforming two rows into two columns
我在 R 中有一個數據框,如下所示:
Word Base Number Type
- - - -
shoe shoe 4834 singular
shoes shoe 49955 plural
toy toy 75465 singular
toys toy 23556 plural
key key 39485 singular
keys key 6546 plural
jazz jazz 58765 plural
我想改變它,使它看起來像這樣:
Word_Sg Word_Pl Base Num_Singular Num_Plural
-- -- -- -- --
shoe shoes shoe 4834 49955
toy toys toy 75465 23556
key keys key 39485 6546
NA jazz jazz NA 58765
因此,與其有兩行單數和復數的值,我想要兩列,一列是單數,另一列是復數。
我已經使用dplyr::summarize
嘗試了一些東西,但到目前為止,沒有任何成功。 這是我到目前為止想出的代碼:
dataframe1 <- dataframe %>%
mutate(Num_Singular = case_when(Type == "singular" ~ Number)) %>%
mutate(Num_Plural = case_when(Type == "plural" ~ Number)) %>%
dplyr::select(Word, Base, Num_Singular, Num_Plural) %>%
group_by(Base) %>%
dplyr::summarize(Num_Singular = paste(na.omit(Num_Singular)),
Num_Plural = paste(na.omit(Num_Plural))
但是,它給了我這個錯誤:
Error in summarise_impl(.data, dots) :
Column `Num_Singular` must be length 1 (a summary value), not 2)
我認為問題可能在於有些行不一定有單數和復數,但只有其中之一(例如“爵士”)。 大多數行都有。
那么我怎樣才能在 R 或 dplyr 中做到這一點呢?
如果你先看前幾列:
select(dat, Base, Word, Type)[1:2,]
# Base Word Type
# 1 shoe shoe singular
# 2 shoe shoes plural
從這里開始,考慮它只是將其擴展為單數/復數列,有效地從“高”到“寬”。 (如果Type
有兩個以上的類別會更明顯。)
select(dat, Base, Word, Type) %>%
spread(Type, Word) %>%
rename(Word_Pl=plural, Word_Sg=singular)
# Base Word_Pl Word_Sg
# 1 jazz jazz <NA>
# 2 key keys key
# 3 shoe shoes shoe
# 4 toy toys toy
您也可以輕松地為Number
重復此操作。 從那里開始,只需根據鍵列Base
合並/加入它們即可:
full_join(
select(dat, Base, Word, Type) %>%
spread(Type, Word) %>%
rename(Word_Pl=plural, Word_Sg=singular),
select(dat, Base, Number, Type) %>%
spread(Type, Number) %>%
rename(Num_Pl=plural, Num_Sg=singular),
by = "Base"
)
# Base Word_Pl Word_Sg Num_Pl Num_Sg
# 1 jazz jazz <NA> 58765 NA
# 2 key keys key 6546 39485
# 3 shoe shoes shoe 49955 4834
# 4 toy toys toy 23556 75465
耗材數據:
library(dplyr)
library(tidyr)
dat <- read.table(text='Word Base Number Type
shoe shoe 4834 singular
shoes shoe 49955 plural
toy toy 75465 singular
toys toy 23556 plural
key key 39485 singular
keys key 6546 plural
jazz jazz 58765 plural', header=TRUE, stringsAsFactors=FALSE)
tidyr
的新pivot_wider()
函數使這變得簡單......
library(dplyr)
library(tidyr)
dat <- read.table(header = T, stringsAsFactors = F, text='
Word Base Number Type
shoe shoe 4834 singular
shoes shoe 49955 plural
toy toy 75465 singular
toys toy 23556 plural
key key 39485 singular
keys key 6546 plural
jazz jazz 58765 plural')
dat %>%
pivot_wider(id_cols = Base, names_from = Type, values_from = c(Word, Number))
# # A tibble: 4 x 5
# Base Word_singular Word_plural Number_singular Number_plural
# <chr> <chr> <chr> <int> <int>
# 1 shoe shoe shoes 4834 49955
# 2 toy toy toys 75465 23556
# 3 key key keys 39485 6546
# 4 jazz NA jazz NA 58765
核心思想是通過它的類型以及它是一個單詞還是一個數字來識別每個數據點......然后很容易傳播到你想要的格式。 (我不會費心重命名變量或專門對它們進行排序以匹配您的預期輸出,因為這很容易做到,而不是這里的問題的一部分)
library(dplyr)
library(tidyr)
dat <- read.table(header = T, stringsAsFactors = F, text='
Word Base Number Type
shoe shoe 4834 singular
shoes shoe 49955 plural
toy toy 75465 singular
toys toy 23556 plural
key key 39485 singular
keys key 6546 plural
jazz jazz 58765 plural')
dat %>%
gather(variable, value, Word, Number) %>%
unite(Type, variable, Type) %>%
spread(Type, value, convert = T) %>%
as_tibble()
# # A tibble: 4 x 5
# Base Number_plural Number_singular Word_plural Word_singular
# <chr> <int> <int> <chr> <chr>
# 1 jazz 58765 NA jazz NA
# 2 key 6546 39485 keys key
# 3 shoe 49955 4834 shoes shoe
# 4 toy 23556 75465 toys toy
您可以通過Base
加入數據的plural
和singular
子集,然后刪除Type
列並重新排序其他列......
full_join(filter(dat, Type == "plural"),
filter(dat, Type == "singular"),
by = "Base",
suffix = c("_Pl", "_Sg")) %>%
select(Word_Sg, Word_Pl, Base, Number_Sg, Number_Pl)
# Word_Sg Word_Pl Base Number_Sg Number_Pl
# 1 shoe shoes shoe 4834 49955
# 2 toy toys toy 75465 23556
# 3 key keys key 39485 6546
# 4 <NA> jazz jazz NA 58765
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.