简体   繁体   中英

How to remove the first three characters from every row in a column in R

I have a large data set with a column of text, 20K rows. Would like to remove the first x number (eg 3) of characters at the beginning of each row in that specific column. Appreciate your assistance.

You can do it with gsub function and simple regex. Here is the code:

# Fake data frame
df <- data.frame(text_col = c("abcd", "abcde", "abcdef"))
df$text_col <- as.character(df$text_col)

# Replace first 3 chracters with empty string ""
df$text_col <- gsub("^.{0,3}", "", df$text_col)

As usual..so many ways to do things in R!

You can also try ?substring :

lotsofdata <- data.frame(column.1=c("DataPoint1", "DataPoint2", "DataPoint3", "DataPoint4"),
    +                 column2=c("MoreData1","MoreData2","MoreData3", "MoreData4"),
    +                 stringsAsFactors=FALSE)
> head(lotsofdata)
    column.1   column2
1 DataPoint1 MoreData1
2 DataPoint2 MoreData2
3 DataPoint3 MoreData3
4 DataPoint4 MoreData4
> substring(lotsofdata[,2],4,nchar(lotsofdata[,2]))
[1] "eData1" "eData2" "eData3" "eData4"

Or column 1 [,1]

> substring(lotsofdata[,1],4,nchar(lotsofdata[,1]))
[1] "aPoint1" "aPoint2" "aPoint3" "aPoint4"

Then just replace it:

x<-substring(lotsofdata[,1],4,nchar(lotsofdata[,1]))

lotsofdata$column.1<-x

> head(lotsofdata)
  column.1   column2
1  aPoint1 MoreData1
2  aPoint2 MoreData2
3  aPoint3 MoreData3
4  aPoint4 MoreData4

With the tidyverse we can use str_sub (and some sample fruit text strings) to do this, by directly specifying start and end points:

library(tidyverse)
tbl <- tibble(some_fruit = fruit)
tbl
#> # A tibble: 80 x 1
#>    some_fruit  
#>    <chr>       
#>  1 apple       
#>  2 apricot     
#>  3 avocado     
#>  4 banana      
#>  5 bell pepper 
#>  6 bilberry    
#>  7 blackberry  
#>  8 blackcurrant
#>  9 blood orange
#> 10 blueberry   
#> # … with 70 more rows
tbl %>%
  mutate(chopped_fruit = str_sub(fruit, 4, -1))
#> # A tibble: 80 x 2
#>    some_fruit   chopped_fruit
#>    <chr>        <chr>        
#>  1 apple        le           
#>  2 apricot      icot         
#>  3 avocado      cado         
#>  4 banana       ana          
#>  5 bell pepper  l pepper     
#>  6 bilberry     berry        
#>  7 blackberry   ckberry      
#>  8 blackcurrant ckcurrant    
#>  9 blood orange od orange    
#> 10 blueberry    eberry       
#> # … with 70 more rows

Created on 2019-02-22 by the reprex package (v0.2.1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM