R dataframe 的值在錯誤的列中

Question

我有一個像這樣的 dataframe：

Name Characteristic_1 Characteristic_2 
Apple Yellow Italian
Pear British Yellow
Strawberries French Red
Blackberry Blue Austrian

如您所見，特征可以在不同的列中，具體取決於行。 我想獲得一個 dataframe ，其中每列僅包含特定特征的值。

Name Characteristic_1 Characteristic_2 
Apple Yellow Italian
Pear  Yellow British
Strawberries Red French
Blackberry Blue Austrian

我的想法是使用 case_when function 但我想知道是否有更快的方法來實現相同的結果。

示例數據：

df <- structure(list(Name = c("Apple", "Pear", "Strawberries", "Blackberry"
), Characteristic_1 = c("Yellow", "British", "French", "Blue"
), Characteristic_2 = c("Italian", "Yellow", "Red", "Austrian"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))

Answer 1

我懷疑有一種更簡單的方法可以解決這個問題，但這里有一個潛在的解決方案：

# Load the libraries
library(tidyverse)

# Load the data
df <- structure(list(Name = c("Apple", "Pear", "Strawberries", "Blackberry"
), Characteristic_1 = c("Yellow", "British", "French", "Blue"
), Characteristic_2 = c("Italian", "Yellow", "Red", "Austrian"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))

# R has 657 built in colour names. You can see them using the `colours()` function.
# Chances are your colours are contained in this list.
# The `str_to_title()` function capitalizes every colour in the list
list_of_colours <- str_to_title(colours())
# If your colours are not contained in the list, add them using e.g.
# `list_of_colours <- c(list_of_colours, "Octarine")`

# Create a new dataframe ("df2") by taking the original dataframe ("df")
df2 <- df %>% 
# Create two new columns called "Colour" and "Origin" using `mutate()` with
# `ifelse` used to identify whether each word is in the list of colours.
# If the word is in the list of colours, add it to the "Colours" column, if
# it isn't, add it to the "Origin" column.
  mutate(Colour = ifelse(!is.na(str_extract(Characteristic_1, paste(list_of_colours, collapse = "|"))),
                       Characteristic_1, Characteristic_2),
         Origin = ifelse(is.na(str_extract(Characteristic_1, paste(list_of_colours, collapse = "|"))),
                         Characteristic_1, Characteristic_2)) %>% 
# Then select the columns you want
  select(Name, Colour, Origin)

df2
# A tibble: 4 x 3
#  Name         Colour Origin  
#  <chr>        <chr>  <chr>   
#1 Apple        Yellow Italian 
#2 Pear         Yellow British 
#3 Strawberries Red    French  
#4 Blackberry   Blue   Austrian

Answer 2

我認為還有一種更好的方法可以實現這一點，但現在這是我想到的一個解決方案：

library(dplyr)
library(stringr)

df <- structure(list(Name = c("Apple", "Pear", "Strawberries", "Blackberry"
), Characteristic_1 = c("Yellow", "British", "French", "Blue"
), Characteristic_2 = c("Italian", "Yellow", "Red", "Austrian"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))

df %>%
  mutate(char_1 = if_else(str_to_lower(Characteristic_1) %in% colours(distinct = TRUE), 
                          Characteristic_1, Characteristic_2), 
         char_2 = if_else(Characteristic_1 == char_1, Characteristic_2, Characteristic_1)) %>%
  select(-c(Characteristic_1, Characteristic_2))

# A tibble: 4 x 3
  Name         char_1 char_2  
  <chr>        <chr>  <chr>   
1 Apple        Yellow Italian 
2 Pear         Yellow British 
3 Strawberries Red    French  
4 Blackberry   Blue   Austrian

R dataframe 的值在錯誤的列中

問題描述

2 個解決方案

解決方案1
0 2021-04-07 00:46:02

解決方案2
0 2021-04-07 00:49:40

R dataframe 的值在錯誤的列中

問題描述

2 個解決方案

解決方案1 0 2021-04-07 00:46:02

解決方案2 0 2021-04-07 00:49:40

解決方案1
0 2021-04-07 00:46:02

解決方案2
0 2021-04-07 00:49:40