简体   繁体   中英

R: Frequency table that is case insensitive

Here is one column of my df: [df$City]
(I have other columns, but I'm just showing one column for simplicity.)

City        
Seattle     
San Diego   
Bern       
SEATTLE
SEATTLE
BERN 

I want to do a frequency count on the cities. I want both "Seattle" and "SEATTLE" to be considered the same - basically, I want the frequency table calculation to be case insensitive.

If I use table(df) it gives me "Seattle" and "SEATTLE" as two different items. I tried to overcome this by using toupper(df) before doing table(df)

However, I get the error: invalid multibyte string.

I checked the encoding of my file and it seems to be UTF-8 - I could be wrong - is there a way for me to check the encoding?

Does anyone know how I can get a frequency table that is case insensitive? It doesn't have to be using my approach.

Thanks in advance!!

You'll want to look into iconv() for the UTF-8 conversion. Also, with the strings, you will probably have to use toupper() or tolower() to standardize them, and maybe stringr::str_trim() to take care of extra white-space...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM