[英]R - replacing blank row values with conditional values from another column
我嘗試搜索並找到了用其他列替換空行值但不是在條件上的答案。 讓我解釋。
我有一個如下所示的數據框:
Name Grade Test1 Test2 Test3
John A none none
Jane B ok none
David none C barely
Sam B none
Thomas D fail
我想用其他列中的字母成績(刪除以下評論)替換成績列中缺失的成績。 在 Test1/Test2/Test3 列中永遠不會有多個字母等級。 所以我要找的結果是這樣的:
Name Grade Test1 Test2 Test3
John A none none
Jane B B ok none
David C none C barely
Sam B none
Thomas D D fail
任何幫助,將不勝感激!
我厚顏無恥地刪除了@akrun 的數據,以展示一種符合拆分-應用-組合范式的替代方法
# define data
df1 <- structure(list(Name = c("John", "Jane", "David", "Sam", "Thomas"
), Grade = c("A", "", "", "B", ""), Test1 = c("none", "B ok",
"none", "none", ""), Test2 = c("none", "none", "C barely", "",
""), Test3 = c("", "", "", "", "D fail")), .Names = c("Name",
"Grade", "Test1", "Test2", "Test3"), class = "data.frame",
row.names = c(NA, -5L))
# load up libraries
library(dplyr)
library(tidyr)
# add a primary key
df1 <- df1 %>%
mutate(PK = 1:nrow(df1))
# turn the test results into tidy format, first by making long and skinny
# and then by bringing it back to one entry per person who has a test result
test_result <- df1 %>%
select(PK, Test1:Test3) %>%
gather(Variable, Value, -PK) %>%
mutate(Value = ifelse(Value == "none", "", substring(Value, 1, 1))) %>%
# drop all the unnecessary rows:
filter(Value != "")
# join back to the main data, fill in the test score when needed
df1 %>%
select(PK, Name, Grade) %>%
left_join(test_result, by = "PK") %>%
mutate(
Source = ifelse(Grade %in% LETTERS, "Grade", as.character(Variable)),
Grade = ifelse(Grade %in% LETTERS, Grade, Value)) %>%
select(-Value, - PK, -Variable)
這為您提供了一個漂亮整潔的數據集,應該更適合未來的分析和重用:
Name Grade Source
1 John A Grade
2 Jane B Test1
3 David C Test2
4 Sam B Grade
5 Thomas D Test3
假設列是character
類,我們得到空白的 'Grade' 元素的索引 ('i1')
i1 <- df1$Grade==''
我們遍歷“測試”列使用IE列3至5 vapply
,子集在具有非空的那些列中的元素( \\\\S
)后跟一個空格( \\\\s
使用) grep
,除去空間和后面跟有sub
字符,並將輸出分配給 'Grade' 中的空白元素。
df1$Grade[i1] <- vapply(df1[i1,3:5], function(x)
sub('\\s+.*$', '', grep('^\\S\\s', x, value=TRUE)), character(1))
df1
# Name Grade Test1 Test2 Test3
#1 John A none none
#2 Jane B B ok none
#3 David C none C barely
#4 Sam B none
#5 Thomas D D fail
df1 <- structure(list(Name = c("John", "Jane", "David", "Sam", "Thomas"
), Grade = c("A", "", "", "B", ""), Test1 = c("none", "B ok",
"none", "none", ""), Test2 = c("none", "none", "C barely", "",
""), Test3 = c("", "", "", "", "D fail")), .Names = c("Name",
"Grade", "Test1", "Test2", "Test3"), class = "data.frame",
row.names = c(NA, -5L))
當我在您的data
上嘗試它時,這有效,首先從數據框中取出一個,然后為每個字符串的成績部分取子字符串,然后將所有列合並為一個並生成最終表:
data[data=="none"]=""
A=function(x) substring(x,1,1)
data1=data.frame(data[1],apply(data[2:5],2,a))
all.grades=paste(data1$grade,data1$test1,data1$test2,data1$test3,sep="")
data1$grade=all.grades
final.data=data.frame(data1[1:2],data[3:5])
final.data
name grade test1 test2 test3
john A
jane B B ok
david C C barely
sam B
thomas D D fail
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.