[英]R converting a string into an integer within a data frame
REF ALT
AT ATT,A,ATTT
I'm working with the data frame above and need to convert the REF
column containing just AT
into an integer of 0
and then the ALT
column containing ATT,A,ATTT
into 1,2,3
, I have tried Transform but it didn't seem to work. 我正在使用上面的数据框,需要将仅包含
AT
的REF
列转换为0
的整数,然后将包含ATT,A,ATTT
的ALT
列转换为1,2,3
,我已经尝试过Transform但它没有似乎行得通。
Also my full data frame has multiple rows so I would need to loop the command to apply to all rows. 另外,我的完整数据帧有多行,因此我需要循环命令以应用于所有行。
Any help would be greatly appreciated 任何帮助将不胜感激
Loop in R is really inefficient so I would strongly advise against it if it is not absolutely necessary and in your case I don't think that it is necessary. R中的循环确实效率很低,因此如果不是绝对必要的话,我强烈建议不要这样做,并且在您的情况下,我认为没有必要。
for example you can do, (assuming your dataframe is called "df"): 例如,您可以执行以下操作(假设您的数据框称为“ df”):
df$REF<-0 # from what I gather all rows should be 0
df$ALT2<-1 # a proxy column that you can copy to ALT after
df$ALT2[df$ALT == "A"] <-2 # converts A to 2
df$ALT2[df$ALT == "ATTT"] <-3 # converts ATTT to 2
df$ALT<-df$ALT2 # copy proxy over to ALT
df$ALT2<-NULL #erase proxy column
If you don't care which character string gets assigned to which number, but you just want different strings to have a different intetger, you can also do: 如果您不在乎将哪个字符串分配给哪个数字,而只是希望不同的字符串具有不同的整数,则还可以执行以下操作:
df$REF<-0 # from what I gather all rows should be 0
df$ALT <- as.numeric(factor(df$ALT)) # give a distinct number to each distinct string counting up from 1.
Setting the REF
column to 0 is straightforward. 将
REF
列设置为0很简单。
df$REF <- 0
For the ALT
column I assume that order for each entry matters but rows are independent. 对于
ALT
列,我认为每个条目的顺序都很重要,但是行是独立的。 So an A
could be numbered 1 in one row but 2 in another (if there are multiple entries in that row). 因此,
A
可以在一行中编号为1,但在另一行中编号为2(如果该行中有多个条目)。 So the only thing we care about is the number of alternatives in each row. 因此,我们唯一关心的是每行中替代项的数量。 We can simply count them and generate a vector with the appropriate numbers, collapsing them into a single string to form the corresponding entry in the data frame:
我们可以简单地对它们进行计数并生成带有适当数字的向量,将它们折叠为单个字符串以形成数据帧中的相应条目:
df$ALT <- sapply(strsplit(df$ALT, ","),
function(alt) paste(1:length(alt), collapse=","))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.