I have a CSV file with an ID
field and a TEXT
field . I need to add a third field with the word count of the TEXT
field on every row. How should I proceed?
Example: If this is my starting data frame
ID TEXT
1 1 Lorem ipsum dolor sit amet
2 2 Praesent venenatis nisl id
3 3 Nunc dapibus maximus vulputate. Nunc
then the desired result is
ID TEXT WordCount
1 1 Lorem ipsum dolor sit amet 5
2 2 Praesent venenatis nisl id 4
3 3 Nunc dapibus maximus vulputate. Nunc 5
I would use the handy stri_count_words()
function from the stringi package.
df$WordCount <- stringi::stri_count_words(df$TEXT)
which gives
ID TEXT WordCount
1 1 Lorem ipsum dolor sit amet 5
2 2 Praesent venenatis nisl id 4
3 3 Nunc dapibus maximus vulputate. Nunc 5
However in base R, you could split on the spaces with strsplit()
after removing the punctuation, then take the lengths of the list elements.
lengths(strsplit(gsub("[[:punct:]]", "", df$TEXT), "\\s+"))
# [1] 5 4 5
Or, as @David suggests, just count the spaces and add 1. trimws()
is used to remove any errant spaces that may be lurking at the beginning or end of the string.
lengths(gregexpr("\\s+", trimws(df$TEXT))) + 1L
# [1] 5 4 5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.