简体   繁体   中英

How to replace words with different case in PANDAS dataframe

I have a really big data frame and it contains a specific column "city" with multiple cities repeating but in different case, for eg -

***City***

Gurgaon
GURGAON
gurgaon
Chennai
CHENNAI
Banglore
Hydrabad
BANGLORE
HYDRABAD

.

Is there a way to replace all the same cities with different case, with a single name.

There are total 3k rows in each column, so manually it's not possible.

Edit -

The city column of the DF also contains cities like

'Gurgaon'

'GURGAON'

'gurgaon '          #there is a white space at the end

I want something so that they all change to the same name and the delimiter is also removed. So that the output is →

 'Gurgaon'
 'Gurgaon'
 'Gurgaon'        #no white space at the end

Thanks

First, change the cities to have the same format:

df.city=df.city.apply(lambda x: x.capitalize())

Then, remove duplicates:

df.drop_duplicates()

(I assume the rest of the columns are equal)

Here is how you can use str.strip() to remove trailing whitespaces, and then use str.title() :

import pandas as pd

df = pd.DataFrame({'City':["Gurgaon",
                           "GURGAON",
                           "gurgaon",
                           "Chennai",
                           "CHENNAI",
                           "Banglore",
                           "Hydrabad",
                           "BANGLORE",
                           "HYDRABAD"]})
df['City'] = df['City'].str.strip()
df['City'] = df['City'].str.title()
print(df)

Output:

       City
0   Gurgaon
1   Gurgaon
2   Gurgaon
3   Chennai
4   Chennai
5  Banglore
6  Hydrabad
7  Banglore
8  Hydrabad

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM