簡體   English   中英

R - 用另一列的條件值替換空白行值

[英]R - replacing blank row values with conditional values from another column

我嘗試搜索並找到了用其他列替換空行值但不是在條件上的答案。 讓我解釋。

我有一個如下所示的數據框:

Name    Grade    Test1    Test2    Test3
John    A        none     none
Jane             B ok     none
David            none     C barely
Sam     B        none
Thomas                             D fail

我想用其他列中的字母成績(刪除以下評論)替換成績列中缺失的成績。 在 Test1/Test2/Test3 列中永遠不會有多個字母等級。 所以我要找的結果是這樣的:

Name   Grade    Test1    Test2    Test3
John   A        none     none
Jane   B        B ok     none
David  C        none     C barely
Sam    B        none
Thomas D                          D fail

任何幫助,將不勝感激!

我厚顏無恥地刪除了@akrun 的數據,以展示一種符合拆分-應用-組合范式的替代方法

# define data
df1 <-  structure(list(Name = c("John", "Jane", "David", "Sam", "Thomas"
), Grade = c("A", "", "", "B", ""), Test1 = c("none", "B ok", 
"none", "none", ""), Test2 = c("none", "none", "C barely", "", 
""), Test3 = c("", "", "", "", "D fail")), .Names = c("Name", 
"Grade", "Test1", "Test2", "Test3"), class = "data.frame",
row.names = c(NA, -5L))

# load up libraries
library(dplyr)
library(tidyr)

# add a primary key
df1 <- df1 %>%
   mutate(PK = 1:nrow(df1))

# turn the test results into tidy format, first by making long and skinny
# and then by bringing it back to one entry per person who has a test result    
test_result <- df1 %>%
   select(PK, Test1:Test3) %>%
   gather(Variable, Value, -PK) %>%
   mutate(Value = ifelse(Value == "none", "", substring(Value, 1, 1))) %>%
   # drop all the unnecessary rows:
   filter(Value != "")

   # join back to the main data, fill in the test score when needed
df1 %>%
   select(PK, Name, Grade) %>%
   left_join(test_result, by = "PK") %>%
   mutate(
      Source = ifelse(Grade %in% LETTERS, "Grade", as.character(Variable)),
      Grade = ifelse(Grade %in% LETTERS, Grade, Value)) %>%
   select(-Value, - PK, -Variable)

這為您提供了一個漂亮整潔的數據集,應該更適合未來的分析和重用:

    Name Grade Source
1   John     A  Grade
2   Jane     B  Test1
3  David     C  Test2
4    Sam     B  Grade
5 Thomas     D  Test3

假設列是character類,我們得到空白的 'Grade' 元素的索引 ('i1')

i1 <- df1$Grade==''

我們遍歷“測試”列使用IE列3至5 vapply ,子集在具有非空的那些列中的元素( \\\\S )后跟一個空格( \\\\s使用) grep ,除去空間和后面跟有sub字符,並將輸出分配給 'Grade' 中的空白元素。

df1$Grade[i1] <- vapply(df1[i1,3:5], function(x)
    sub('\\s+.*$', '', grep('^\\S\\s', x, value=TRUE)), character(1))
df1
#    Name Grade Test1    Test2  Test3
#1   John     A  none     none       
#2   Jane     B  B ok     none       
#3  David     C  none C barely       
#4    Sam     B  none                
#5 Thomas     D                D fail

數據

df1 <-  structure(list(Name = c("John", "Jane", "David", "Sam", "Thomas"
), Grade = c("A", "", "", "B", ""), Test1 = c("none", "B ok", 
"none", "none", ""), Test2 = c("none", "none", "C barely", "", 
""), Test3 = c("", "", "", "", "D fail")), .Names = c("Name", 
"Grade", "Test1", "Test2", "Test3"), class = "data.frame",
row.names = c(NA, -5L))

當我在您的data上嘗試它時,這有效,首先從數據框中取出一個,然后為每個字符串的成績部分取子字符串,然后將所有列合並為一個並生成最終表:

data[data=="none"]=""
A=function(x) substring(x,1,1)
data1=data.frame(data[1],apply(data[2:5],2,a))
all.grades=paste(data1$grade,data1$test1,data1$test2,data1$test3,sep="")
data1$grade=all.grades
final.data=data.frame(data1[1:2],data[3:5])
final.data

name   grade   test1    test2    test3
john       A                      
jane       B    B ok                
david      C          C barely       
sam        B                      
thomas     D                    D fail

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM