I have a large data set with a column of text, 20K rows. Would like to remove the first x number (eg 3) of characters at the beginning of each row in that specific column. Appreciate your assistance.
You can do it with gsub
function and simple regex. Here is the code:
# Fake data frame
df <- data.frame(text_col = c("abcd", "abcde", "abcdef"))
df$text_col <- as.character(df$text_col)
# Replace first 3 chracters with empty string ""
df$text_col <- gsub("^.{0,3}", "", df$text_col)
As usual..so many ways to do things in R!
You can also try ?substring
:
lotsofdata <- data.frame(column.1=c("DataPoint1", "DataPoint2", "DataPoint3", "DataPoint4"),
+ column2=c("MoreData1","MoreData2","MoreData3", "MoreData4"),
+ stringsAsFactors=FALSE)
> head(lotsofdata)
column.1 column2
1 DataPoint1 MoreData1
2 DataPoint2 MoreData2
3 DataPoint3 MoreData3
4 DataPoint4 MoreData4
> substring(lotsofdata[,2],4,nchar(lotsofdata[,2]))
[1] "eData1" "eData2" "eData3" "eData4"
Or column 1 [,1]
> substring(lotsofdata[,1],4,nchar(lotsofdata[,1]))
[1] "aPoint1" "aPoint2" "aPoint3" "aPoint4"
Then just replace it:
x<-substring(lotsofdata[,1],4,nchar(lotsofdata[,1]))
lotsofdata$column.1<-x
> head(lotsofdata)
column.1 column2
1 aPoint1 MoreData1
2 aPoint2 MoreData2
3 aPoint3 MoreData3
4 aPoint4 MoreData4
With the tidyverse
we can use str_sub
(and some sample fruit
text strings) to do this, by directly specifying start and end points:
library(tidyverse)
tbl <- tibble(some_fruit = fruit)
tbl
#> # A tibble: 80 x 1
#> some_fruit
#> <chr>
#> 1 apple
#> 2 apricot
#> 3 avocado
#> 4 banana
#> 5 bell pepper
#> 6 bilberry
#> 7 blackberry
#> 8 blackcurrant
#> 9 blood orange
#> 10 blueberry
#> # … with 70 more rows
tbl %>%
mutate(chopped_fruit = str_sub(fruit, 4, -1))
#> # A tibble: 80 x 2
#> some_fruit chopped_fruit
#> <chr> <chr>
#> 1 apple le
#> 2 apricot icot
#> 3 avocado cado
#> 4 banana ana
#> 5 bell pepper l pepper
#> 6 bilberry berry
#> 7 blackberry ckberry
#> 8 blackcurrant ckcurrant
#> 9 blood orange od orange
#> 10 blueberry eberry
#> # … with 70 more rows
Created on 2019-02-22 by the reprex package (v0.2.1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.