简体   繁体   English

R在数据帧内将字符串转换为整数

[英]R converting a string into an integer within a data frame

REF         ALT 
 AT  ATT,A,ATTT

I'm working with the data frame above and need to convert the REF column containing just AT into an integer of 0 and then the ALT column containing ATT,A,ATTT into 1,2,3 , I have tried Transform but it didn't seem to work. 我正在使用上面的数据框,需要将仅包含ATREF列转换为0的整数,然后将包含ATT,A,ATTTALT列转换为1,2,3 ,我已经尝试过Transform但它没有似乎行得通。

Also my full data frame has multiple rows so I would need to loop the command to apply to all rows. 另外,我的完整数据帧有多行,因此我需要循环命令以应用于所有行。

Any help would be greatly appreciated 任何帮助将不胜感激

Loop in R is really inefficient so I would strongly advise against it if it is not absolutely necessary and in your case I don't think that it is necessary. R中的循环确实效率很低,因此如果不是绝对必要的话,我强烈建议不要这样做,并且在您的情况下,我认为没有必要。

for example you can do, (assuming your dataframe is called "df"): 例如,您可以执行以下操作(假设您的数据框称为“ df”):

df$REF<-0 # from what I gather all rows should be 0


df$ALT2<-1 # a proxy column that you can copy to ALT after
df$ALT2[df$ALT == "A"] <-2 # converts A to 2
df$ALT2[df$ALT == "ATTT"] <-3 # converts ATTT to 2

df$ALT<-df$ALT2 # copy proxy over to ALT
df$ALT2<-NULL #erase proxy column

If you don't care which character string gets assigned to which number, but you just want different strings to have a different intetger, you can also do: 如果您不在乎将哪个字符串分配给哪个数字,而只是希望不同的字符串具有不同的整数,则还可以执行以下操作:

df$REF<-0 # from what I gather all rows should be 0
df$ALT <- as.numeric(factor(df$ALT)) # give a distinct number to each distinct string counting up from 1.

Setting the REF column to 0 is straightforward. REF列设置为0很简单。

df$REF <- 0

For the ALT column I assume that order for each entry matters but rows are independent. 对于ALT列,我认为每个条目的顺序都很重要,但是行是独立的。 So an A could be numbered 1 in one row but 2 in another (if there are multiple entries in that row). 因此, A可以在一行中编号为1,但在另一行中编号为2(如果该行中有多个条目)。 So the only thing we care about is the number of alternatives in each row. 因此,我们唯一关心的是每行中替代项的数量。 We can simply count them and generate a vector with the appropriate numbers, collapsing them into a single string to form the corresponding entry in the data frame: 我们可以简单地对它们进行计数并生成带有适当数字的向量,将它们折叠为单个字符串以形成数据帧中的相应条目:

df$ALT <- sapply(strsplit(df$ALT, ","), 
    function(alt) paste(1:length(alt), collapse=","))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM