[英]How can I replace multiple rows simultaneously in a Python dataframe?
I have a dataset with the following unique values in one of its columns.我在其中一列中有一个具有以下唯一值的数据集。
df['Gender'].unique()
array(['Female', 'M', 'Male', 'male', 'm', 'Male-ish', 'maile',
'Trans-female', 'Cis Female', 'something kinda male?', 'Cis Male',
'queer/she/they', 'non-binary', 'Make', 'Nah', 'All', 'Enby',
'fluid', 'Genderqueer', 'Androgyne', 'Agender', 'Guy (-ish) ^_^',
'male leaning androgynous', 'Male ', 'Man', 'msle', 'Neuter',
'queer', 'A little about you', 'Malr',
'ostensibly male, unsure what that really means')]
As you can see, there are obvious cases where a row should be listed as 'Male' (I'm referring to the cases where 'Male' is misspelled, of course).如您所见,在某些情况下,一行应列为“男性”(当然,我指的是“男性”拼写错误的情况)。 How can I replace these values with 'Male' without calling the replace function ten times?如何在不调用替换 function 十次的情况下将这些值替换为“男性”? This is the code I have tried:这是我尝试过的代码:
x=0
while x<=11:
for i in df['Gender']:
if i[0:2]=='Ma':
print('Male')
elif i[0]=='m':
print('Male')
x+=1
However, I just get a print of a bunch of "Male".然而,我只是得到一堆“男性”的打印。
Edit: I want to convert the following values to 'Male': 'M', 'male', 'm', 'maile', 'Make', 'Man', 'msle', 'Malr', 'Male '编辑:我想将以下值转换为 'Male':'M'、'male'、'm'、'maile'、'Make'、'Man'、'msle'、'Malr'、'Male'
Create a list with all the nicknames of Male:创建一个包含 Male 的所有昵称的列表:
males_list = ['M', 'male', 'm', 'maile', 'Make', 'Man', 'msle', 'Malr', 'Male ']
And then replace them with:然后将它们替换为:
df.loc[df['Gender'].isin(males_list), 'Gender'] = 'Male'
btw: There is always a better solution than looping the rows in pandas
, not just in cases like this.顺便说一句:总有比循环pandas
中的行更好的解决方案,而不仅仅是在这种情况下。
I would use the map
function as it allows you to create any custom logic.我会使用map
function 因为它允许您创建任何自定义逻辑。 So for instance, by looking at your code, something like this would do the trick:因此,例如,通过查看您的代码,这样的事情就可以解决问题:
def correct_gender(text):
if text[0:2]=='Ma' or text[0]=='m':
return "Male"
return text
df["Gender"] = df["Gender"].map(correct_gender)
If I understand you correctly, you want a more generalized approach.如果我对您的理解正确,您需要一种更通用的方法。 We can use regex to check if the word starts with M
or has the letters Ma
preceded by a whitespace, so we dont catch Female
:我们可以使用正则表达式来检查单词是否以M
开头或字母Ma
前面有一个空格,所以我们不捕获Female
:
(?i)
: stands for ignore case sensitivity (?i)
: 代表忽略大小写敏感?<=\s
: means all the words which start with ma
and are preceded by a whitespace ?<=\s
:表示所有以ma
开头并以空格开头的单词df.loc[df['Gender'].str.contains('(?i)^M|(?<=\s)ma'), 'Gender'] = 'Male'
Output Output
Gender
0 Female
1 Male
2 Male
3 Male
4 Male
5 Male
6 Male
7 Trans-female
8 Cis Female
9 Male
10 Male
11 queer/she/they
12 non-binary
13 Male
14 Nah
15 All
16 Enby
17 fluid
18 Genderqueer
19 Androgyne
20 Agender
21 Guy (-ish) ^_^
22 Male
23 Male
24 Male
25 Male
26 Neuter
27 queer
28 A little about you
29 Male
30 Male
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.