[英]Removing all rows that do not meet criteria in R?
我似乎無法在這里找到類似於我的場景的解決方案。 這是我的示例數據集中的一列:
How_do_you_feel
Excited, Hopeful, Prepared, good
Unsure, confused, anxious, curious
Co operations, Teamwork, communication, critical thinking
a
First, team work, nervous, curious
Interesting. New. Exciting. Develop
perplexed,anxious,embarrassed,bit excited
Novel, Unknown, Challenging, Useful
Worried, excited, self-doubt, motivated
Excited,curious,nervous,worried
正確的格式應該是 4 個單詞,中間用逗號分隔,例如“ Excited, Hopeful, Prepared, good ”。
我如何清理我的數據以刪除所有格式錯誤的行,例如“有趣”。 新的。 令人興奮。 發展”或“困惑、焦慮、尷尬、有點興奮”?
所以結果看起來像這樣:
How_do_you_feel
Excited, Hopeful, Prepared, good
Unsure, confused, anxious, curious
Co operations, Teamwork, communication, critical thinking
First, team work, nervous, curious
Novel, Unknown, Challenging, Useful
Worried, excited, self-doubt, motivated
謝謝!
這是一種潛在的解決方案:
library(tidyverse)
lines <- c("Excited, Hopeful, Prepared, good",
"Unsure, confused, anxious, curious",
"Co operations, Teamwork, communication, critical thinking",
"a",
"First, team work, nervous, curious",
"Interesting. New. Exciting. Develop",
"perplexed,anxious,embarrassed,bit excited",
"Novel, Unknown, Challenging, Useful",
"Worried, excited, self-doubt, motivated",
"Excited,curious,nervous,worried")
df <- data.frame(How_do_you_feel = lines)
df
#> How_do_you_feel
#> 1 Excited, Hopeful, Prepared, good
#> 2 Unsure, confused, anxious, curious
#> 3 Co operations, Teamwork, communication, critical thinking
#> 4 a
#> 5 First, team work, nervous, curious
#> 6 Interesting. New. Exciting. Develop
#> 7 perplexed,anxious,embarrassed,bit excited
#> 8 Novel, Unknown, Challenging, Useful
#> 9 Worried, excited, self-doubt, motivated
#> 10 Excited,curious,nervous,worried
df %>%
mutate(How_do_you_feel = str_extract(
How_do_you_feel,
"[[:alpha:][:punct:] ]+, [[:alpha:][:punct:] ]+, [[:alpha:][:punct:] ]+, [[:alpha:][:punct:] ]+"
)) %>%
filter(!is.na(How_do_you_feel))
#> How_do_you_feel
#> 1 Excited, Hopeful, Prepared, good
#> 2 Unsure, confused, anxious, curious
#> 3 Co operations, Teamwork, communication, critical thinking
#> 4 First, team work, nervous, curious
#> 5 Novel, Unknown, Challenging, Useful
#> 6 Worried, excited, self-doubt, motivated
由reprex 包於 2022-07-22 創建 (v2.0.1)
一個似乎適用於您的情況的通用規則是,三個逗號后跟一個空格(而不僅僅是前面答案中的逗號)意味着一個很好的匹配。 嘗試這個:
library(tidyverse)
read_delim("How_do_you_feel
Excited, Hopeful, Prepared, good
Unsure, confused, anxious, curious
Co operations, Teamwork, communication, critical thinking
a
First, team work, nervous, curious
Interesting. New. Exciting. Develop
perplexed,anxious,embarrassed,bit excited
Novel, Unknown, Challenging, Useful
Worried, excited, self-doubt, motivated
Excited,curious,nervous,worried", delim = "\\n") %>%
mutate(How_do_you_feel = str_trim(How_do_you_feel)) %>%
filter(str_detect(How_do_you_feel, paste("^", paste(rep("[[:alpha:]- ]+", times = 4), collapse = ", "), "$", sep = "")))
# How_do_you_feel
# <chr>
# 1 "Excited, Hopeful, Prepared, good "
# 2 "Unsure, confused, anxious, curious "
# 3 "Co operations, Teamwork, communication, critical thinking "
# 4 "First, team work, nervous, curious "
# 5 "Novel, Unknown, Challenging, Useful "
# 6 "Worried, excited, self-doubt, motivated "
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.