This is an extract of my DataFrame
data = [
['Citroën Amillis', '20 Za Des Baliveaux - 77120 Amillis', '77120', 'ok'],
['Relat Paris 9e', 'Métro Opéra - 75009 Paris 9e', 'Paris', 'error'],
['Macif Avon', '49 Av Franklin Roosevelt - 77210 Avon', '77210', 'ok'],
['Atac La Chapelle-la-Reine', 'Za Rue De L\'avenir - 77760 La Chapelle-la-Reine', 'La', 'error'],
['Société Générale La Ferté-Gaucher', '42 Rue De Paris - 77320 La Ferté-Gaucher', 'La', 'error']
]
df = pd.DataFrame(data, columns=['nom_magasin', 'adresse', 'code_postal', 'is_code_postal'])
df
As you can see, there are mistakes in my dataframe. For some addresses, especially when the city name is composed (ex:"La Chapelle-la-Reine"), the column "code_postal" is wrong.
What I'm looking to do is the following: if the column "is_code_postal" is an "error", replace "code_postal" by the regex of the postal code appearing in the column "adresse".
I can't find the solution. To do I've try this df['is_code_postal'] = np.where(df.code_postal.str.match('^[a-zA-z]'), 'error', 'ok')
. At first I was thinking about doing all changes within the same function. But I'm missing something.
And the important thing is that my dataframe is a little bit heavy (more than 250K rows) so I'd like to go for an effective solution.
Do you guys have any idea?
您可以忽略 code_postal 并使用 Quang 的代码直接从“地址”中提取它:
df['code_postal']=df['adresse'].str.extract('(\d{5})')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.