用基於正則表達式的另一個列值替換一個列值 - Python

Question

這是我的 DataFrame 的摘錄

data = [
    ['Citroën Amillis', '20 Za Des Baliveaux - 77120 Amillis', '77120', 'ok'],
    ['Relat Paris 9e', 'Métro Opéra - 75009 Paris 9e', 'Paris', 'error'],
    ['Macif Avon', '49 Av Franklin Roosevelt - 77210 Avon', '77210', 'ok'],
    ['Atac La Chapelle-la-Reine', 'Za Rue De L\'avenir - 77760 La Chapelle-la-Reine', 'La', 'error'],
    ['Société Générale La Ferté-Gaucher', '42 Rue De Paris - 77320 La Ferté-Gaucher', 'La', 'error']
]

df = pd.DataFrame(data, columns=['nom_magasin', 'adresse', 'code_postal', 'is_code_postal'])

df

如您所見，我的數據框中存在錯誤。 對於某些地址，特別是當城市名稱是組成時（例如：“La Chapelle-la-Reine”），“code_postal”列是錯誤的。

我想要做的是以下內容：如果“is_code_postal”列是“錯誤”，則將“code_postal”替換為“adresse”列中出現的郵政編碼的正則表達式。

我找不到解決方案。 為此，我嘗試了df['is_code_postal'] = np.where(df.code_postal.str.match('^[a-zA-z]'), 'error', 'ok') 。 起初我正在考慮在同一個函數中進行所有更改。 但我錯過了一些東西。

重要的是我的數據框有點重（超過 25 萬行），所以我想尋求一個有效的解決方案。

你們有什么想法嗎？

Answer 1

您可以忽略 code_postal 並使用 Quang 的代碼直接從“地址”中提取它：

df['code_postal']=df['adresse'].str.extract('(\d{5})')

用基於正則表達式的另一個列值替換一個列值 - Python

問題描述

1 個解決方案

解決方案1
2 已采納 2020-01-13 16:52:13

用基於正則表達式的另一個列值替換一個列值 - Python

問題描述

1 個解決方案

解決方案1 2 已采納 2020-01-13 16:52:13

解決方案1
2 已采納 2020-01-13 16:52:13