Here is one column of my df: [df$City]
(I have other columns, but I'm just showing one column for simplicity.)
City
Seattle
San Diego
Bern
SEATTLE
SEATTLE
BERN
I want to do a frequency count on the cities. I want both "Seattle" and "SEATTLE" to be considered the same - basically, I want the frequency table calculation to be case insensitive.
If I use table(df)
it gives me "Seattle" and "SEATTLE" as two different items. I tried to overcome this by using toupper(df)
before doing table(df)
However, I get the error: invalid multibyte string.
I checked the encoding of my file and it seems to be UTF-8 - I could be wrong - is there a way for me to check the encoding?
Does anyone know how I can get a frequency table that is case insensitive? It doesn't have to be using my approach.
Thanks in advance!!
You'll want to look into iconv()
for the UTF-8 conversion. Also, with the strings, you will probably have to use toupper()
or tolower()
to standardize them, and maybe stringr::str_trim()
to take care of extra white-space...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.