如何用除某些值之外的字符串替換 Pandas 列的字符串值？

Question

示例 DataFrame：

import pandas as pd
df = pd.DataFrame({'Age' : [70.0, 58.0, 44.0, 40.0, 21.0, 35.0, 12.0, 43.0, 45.0, 65.0, 56.0, 31.0, 30.0,
                            52.0, 59.0, 52.0, 31.0, 55.0, 42.0, 73.0],
                   'MarketSegment' : ['Travel Agent/Operator', 'Other', 'Other', 'Other', 'Other',
                                      'Direct', 'Groups', 'Other', 'Other', 'Direct', 'Other',
                                      'Other', 'Other', 'Other', 'Groups', 'Groups', 'Other', 'Other',
                                      'Groups', 'Other'],
                   'Nationality' : ['CAN', 'ESP', 'FRA', 'DEU', 'GBR', 'RUS', 'IRL', 'FRA', 'IRL',
                                    'BRA', 'LTU', 'CHE', 'FRA', 'GBR', 'FRA', 'PRT', 'DEU', 'ESP',
                                    'CHE', 'USA']})

首先，我只想要前 3 個最常見的國籍。 我使用了下面的代碼：

top_nat = df.groupby('Nationality').count().sort_values \
(by='Age', ascending = False).head(3).iloc[:, 0].index.to_list()

（有沒有辦法只使用“民族”列中唯一值的頻率來做到這一點？不使用任何其他列，比如“年齡”？）

現在我希望“民族”中的所有值都替換為“OTR”，除了值 == top_nat 中的值。 我試過這樣的東西：

df['Nationality'].replace(~top_nat,'OTR', inplace=True)

df["Nationality"] = df["Nationality"].apply(lambda x: x.replace(~top_nat, "OTR"))

for x in top_nat:
    df.loc[df['Nationality'] != x, 'Nationality'] = 'OTR'

沒有任何工作。 也許我想要類似的東西：

if values in df.Nationality != values in top_nat:
    replace that value in df.Nationality with 'OTR'
else:
    continue

原始數據集的形狀是 (82580, 30)，我需要前 15 個國籍。 請幫忙。

Answer 1

首先獲得前3名：

top = df['Nationality'].value_counts().nlargest(3).index

然后設置國籍

df.loc[~df['Nationality'].isin(top), 'Nationality'] = 'OTR'

這將保留前 3 個國籍，並將其他所有內容替換為“OTR”

Answer 2

對於第一個問題，只需輸入df['Nationality'].value_counts()[:3].index 。 您不需要在這里使用groupby() 。

對於第二個，您可以這樣做：

a = list(df['Nationality'].value_counts()[:3].index)
df['Nationality'] = df['Nationality'].apply(lambda x : x if x not in a else 'OTR')

如何用除某些值之外的字符串替換 Pandas 列的字符串值？

問題描述

2 個解決方案

解決方案1
1 已采納 2022-08-23 06:16:40

解決方案2
0 2022-08-23 06:24:23

如何用除某些值之外的字符串替換 Pandas 列的字符串值？

問題描述

2 個解決方案

解決方案1 1 已采納 2022-08-23 06:16:40

解決方案2 0 2022-08-23 06:24:23

解決方案1
1 已采納 2022-08-23 06:16:40

解決方案2
0 2022-08-23 06:24:23