如何在 Python dataframe 中同时替换多行？

Question

I have a dataset with the following unique values in one of its columns.我在其中一列中有一个具有以下唯一值的数据集。

   df['Gender'].unique()

   array(['Female', 'M', 'Male', 'male', 'm', 'Male-ish', 'maile',
   'Trans-female', 'Cis Female', 'something kinda male?', 'Cis Male',
   'queer/she/they', 'non-binary', 'Make', 'Nah', 'All', 'Enby',
   'fluid', 'Genderqueer', 'Androgyne', 'Agender', 'Guy (-ish) ^_^',
   'male leaning androgynous', 'Male ', 'Man', 'msle', 'Neuter',
   'queer', 'A little about you', 'Malr',
   'ostensibly male, unsure what that really means')]

As you can see, there are obvious cases where a row should be listed as 'Male' (I'm referring to the cases where 'Male' is misspelled, of course).如您所见，在某些情况下，一行应列为“男性”（当然，我指的是“男性”拼写错误的情况）。 How can I replace these values with 'Male' without calling the replace function ten times?如何在不调用替换 function 十次的情况下将这些值替换为“男性”？ This is the code I have tried:这是我尝试过的代码：

x=0
while x<=11:
for i in df['Gender']:
    if i[0:2]=='Ma':
        print('Male')
    elif i[0]=='m':
        print('Male')
x+=1

However, I just get a print of a bunch of "Male".然而，我只是得到一堆“男性”的打印。

Edit: I want to convert the following values to 'Male': 'M', 'male', 'm', 'maile', 'Make', 'Man', 'msle', 'Malr', 'Male '编辑：我想将以下值转换为 'Male'：'M'、'male'、'm'、'maile'、'Make'、'Man'、'msle'、'Malr'、'Male'

Answer 1

Create a list with all the nicknames of Male:创建一个包含 Male 的所有昵称的列表：

males_list = ['M', 'male', 'm', 'maile', 'Make', 'Man', 'msle', 'Malr', 'Male ']

And then replace them with:然后将它们替换为：

df.loc[df['Gender'].isin(males_list), 'Gender'] = 'Male'

btw: There is always a better solution than looping the rows in pandas , not just in cases like this.顺便说一句：总有比循环pandas中的行更好的解决方案，而不仅仅是在这种情况下。

Answer 2

I would use the map function as it allows you to create any custom logic.我会使用map function 因为它允许您创建任何自定义逻辑。 So for instance, by looking at your code, something like this would do the trick:因此，例如，通过查看您的代码，这样的事情就可以解决问题：

def correct_gender(text):

    if text[0:2]=='Ma' or text[0]=='m':
        return "Male"

    return text

df["Gender"] = df["Gender"].map(correct_gender)

Answer 3

If I understand you correctly, you want a more generalized approach.如果我对您的理解正确，您需要一种更通用的方法。 We can use regex to check if the word starts with M or has the letters Ma preceded by a whitespace, so we dont catch Female :我们可以使用正则表达式来检查单词是否以M开头或字母Ma前面有一个空格，所以我们不捕获Female ：

(?i) : stands for ignore case sensitivity (?i) : 代表忽略大小写敏感
?<=\s : means all the words which start with ma and are preceded by a whitespace ?<=\s ：表示所有以ma开头并以空格开头的单词

df.loc[df['Gender'].str.contains('(?i)^M|(?<=\s)ma'), 'Gender'] = 'Male'

Output Output

                Gender
0               Female
1                 Male
2                 Male
3                 Male
4                 Male
5                 Male
6                 Male
7         Trans-female
8           Cis Female
9                 Male
10                Male
11      queer/she/they
12          non-binary
13                Male
14                 Nah
15                 All
16                Enby
17               fluid
18         Genderqueer
19           Androgyne
20             Agender
21      Guy (-ish) ^_^
22                Male
23                Male
24                Male
25                Male
26              Neuter
27               queer
28  A little about you
29                Male
30                Male

如何在 Python dataframe 中同时替换多行？

问题描述

3 个解决方案

解决方案1
3 已采纳 2019-09-24 11:44:42

解决方案2
1 2019-09-24 11:46:10

解决方案3
1 2019-09-24 11:48:22

如何在 Python dataframe 中同时替换多行？

问题描述

3 个解决方案

解决方案1 3 已采纳 2019-09-24 11:44:42

解决方案2 1 2019-09-24 11:46:10

解决方案3 1 2019-09-24 11:48:22

解决方案1
3 已采纳 2019-09-24 11:44:42

解决方案2
1 2019-09-24 11:46:10

解决方案3
1 2019-09-24 11:48:22