简体   繁体   English

如何使用R删除data.frame中特定列中的字符?

[英]How to remove a character in specific columns in a data.frame with R?

I have a list results of several data.frames (each data.frame has 3 columns). 我有几个data.frames的列表results (每个data.frame有3列)。 It looks like this : 看起来像这样:

> tail(results[[1]])
                    var1                var2        corr
4945 UniRef90_A0A075GGL3 UniRef90_A0A075GGW4 -0.12058932
4946 UniRef90_A0A075GGU1 UniRef90_A0A075GGW4 -0.01740142
4947 UniRef90_A0A075GGU4 UniRef90_A0A075GGW4  0.16400148
4948 UniRef90_A0A075GGV0 UniRef90_A0A075GGW4 -0.09698018
4949 UniRef90_A0A075GGV1 UniRef90_A0A075GGW4  0.22409572
4950 UniRef90_A0A075GGV8 UniRef90_A0A075GGW4  0.43184873

> tail(results[[2]])
                    var1                var2       corr
4945 UniRef90_A0A075GJW0 UniRef90_A0A075GKB8 -0.1059095
4946 UniRef90_A0A075GJW5 UniRef90_A0A075GKB8 -0.4336370
4947 UniRef90_A0A075GJX5 UniRef90_A0A075GKB8 -0.1875841
4948 UniRef90_A0A075GJY4 UniRef90_A0A075GKB8  0.2658149
4949 UniRef90_A0A075GJY8 UniRef90_A0A075GKB8 -0.2820792
4950 UniRef90_A0A075GJY9 UniRef90_A0A075GKB8 -0.2402827

I will bind these data.frames into only one. 我将这些data.frames绑定为一个。 But that'll give a huge data.frame. 但这会提供巨大的data.frame。 That's why I would like to remove the string UniRef90_ in the columns var1 and var2 in order to reduce the size, before the binding. 这就是为什么我想在绑定之前删除var1var2列中的字符串UniRef90_以减小大小。

Any help? 有什么帮助吗?

您可以在弯曲数据框之前在var1和var2上尝试此操作。

sub("UniRef90_","", dataframe$yourvariable)

We can loop through the list , and remove the substring with either substring or str_remove 我们可以遍历list ,并使用substringstr_remove删除子substring

library(tidyverse)
map_df(results, ~ .x %>%
                   mutate_at(vars(matches('^var\\d+$')),
              list(~ str_remove(., "^UniRef90_"))))
#     var1       var2        corr
#1  A0A075GGL3 A0A075GGW4 -0.12058932
#2  A0A075GGU1 A0A075GGW4 -0.01740142
#3  A0A075GGU4 A0A075GGW4  0.16400148
#4  A0A075GGV0 A0A075GGW4 -0.09698018
#5  A0A075GGV1 A0A075GGW4  0.22409572
#6  A0A075GGV8 A0A075GGW4  0.43184873
#7  A0A075GJW0 A0A075GKB8 -0.10590950
#8  A0A075GJW5 A0A075GKB8 -0.43363700
#9  A0A075GJX5 A0A075GKB8 -0.18758410
#10 A0A075GJY4 A0A075GKB8  0.26581490
#11 A0A075GJY8 A0A075GKB8 -0.28207920
#12 A0A075GJY9 A0A075GKB8 -0.24028270

data 数据

results <- list(structure(list(var1 = c("UniRef90_A0A075GGL3", 
  "UniRef90_A0A075GGU1", 
"UniRef90_A0A075GGU4", "UniRef90_A0A075GGV0", "UniRef90_A0A075GGV1", 
"UniRef90_A0A075GGV8"), var2 = c("UniRef90_A0A075GGW4", 
 "UniRef90_A0A075GGW4", 
"UniRef90_A0A075GGW4", "UniRef90_A0A075GGW4", "UniRef90_A0A075GGW4", 
"UniRef90_A0A075GGW4"), corr = c(-0.12058932, -0.01740142, 0.16400148, 
-0.09698018, 0.22409572, 0.43184873)), class = "data.frame", row.names = c("4945", 
"4946", "4947", "4948", "4949", "4950")), 
 structure(list(var1 = c("UniRef90_A0A075GJW0", 
"UniRef90_A0A075GJW5", "UniRef90_A0A075GJX5", "UniRef90_A0A075GJY4", 
"UniRef90_A0A075GJY8", "UniRef90_A0A075GJY9"), var2 = c("UniRef90_A0A075GKB8", 
"UniRef90_A0A075GKB8", "UniRef90_A0A075GKB8", "UniRef90_A0A075GKB8", 
"UniRef90_A0A075GKB8", "UniRef90_A0A075GKB8"), corr = c(-0.1059095, 
-0.433637, -0.1875841, 0.2658149, -0.2820792, -0.2402827)),
class = "data.frame", row.names = c("4945", 
"4946", "4947", "4948", "4949", "4950")))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM