简体   繁体   English

用 R 中另一列的值替换一列中的空单元格

[英]Replacing empty cells in a column with values from another column in R

I am trying to pull the cell values from the StudyID column to the empty cells SigmaID column, but I am running into an odd issue with the output.我试图将单元格值从 StudyID 列拉到空单元格 SigmaID 列,但我遇到了 output 的奇怪问题。

This is how my data looks before running commands.这是我的数据在运行命令之前的样子。

StudyID    Gender  Region  SigmaID
LM24008      1       20    LM24008  
LM82993      1       16    LM28888  
ST04283      0       44      
ST04238      0       50      
LM04829      1       24    LM23921  
ST91124      0       89
ST29001      0       55

I tried accomplishing this by writing the syntax in three ways, because I wasn't sure if there is a problem with the way the logic was set up.我尝试通过以三种方式编写语法来完成此操作,因为我不确定逻辑的设置方式是否存在问题。 All three produce the same output.三者都生产相同的 output。

df$SigmaID <- ifelse(test = df$SigmaID != "", yes = df$SigmaID, no = df$StudyID)

df$SigmaID <- ifelse(df$SigmaID == "", df$StudyID, df3$SigmaID)

df %>% mutate(SigmaID = ifelse(Gender == 0, df$StudyID, df$SigmaID)

Output: instead of pulling the values from from the StudyID column, it is populating one to four digit numbers. Output:不是从 StudyID 列中提取值,而是填充一到四位数字。

StudyID    Gender  Region  SigmaID
LM24008      1       20    LM24008  
LM82993      1       16    LM28888  
ST04283      0       44    5  
ST04238      0       50    4908  
LM04829      1       24    LM23921
ST91124      0       89    209
ST29001      0       55    4092

I have tried recoding the empty spaces to NA and then calling on NA in the logic, but this produced the same output as seen above.我尝试将空格重新编码为 NA,然后在逻辑中调用 NA,但这会产生与上面相同的 output。 I'm wondering if it could have anything to do with variable type or variable attributes and something's off about how it's reading the characters in StudyID.我想知道它是否与变量类型或变量属性有关,以及它如何读取 StudyID 中的字符。 Would appreciate feedback on this issue!非常感谢您对此问题的反馈!

Here is how to do it:这是如何做到的:

df$SigmaID[df$SigmaID == ""] = df$StudyID[df$SigmaID == ""]

df[df$SigmaID == ""] selects only the rows where SigmaID=="" df[df$SigmaID == ""]只选择SigmaID==""的行

I also recommend using data.table instead of data.frame .我还建议使用data.table而不是data.frame It is faster and has some useful syntax features:它速度更快,并且有一些有用的语法特性:

library(data.table)
setDT(df) # setDT converts a data.frame to a data.table
df[SigmaID=="",SigmaId:=StudyID]

Following up on this, As it turns out.跟进这一点,事实证明。 default R converts string types into factors.默认 R 将字符串类型转换为因子。 There are a few ways of addressing the issue above.有几种方法可以解决上述问题。

i <- sapply[df, is.factor]
df[i] <- lapply(df[i], as.character)

Another method:另一种方法:

df <- read.csv("/insert file pathway here", stringAsFactors = FALSE)

This is what I found to be helpful.这是我发现有帮助的。 I'm sure there are additional methods of troubleshooting this as well.我相信还有其他解决此问题的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM