简体   繁体   中英

Replacing column values in a pandas DataFrame

I'm trying to replace the values in one column of a dataframe. The column ('female') only contains the values 'female' and 'male'.

I have tried the following:

w['female']['female']='1'
w['female']['male']='0' 

But receive the exact same copy of the previous results.

I would ideally like to get some output which resembles the following loop element-wise.

if w['female'] =='female':
    w['female'] = '1';
else:
    w['female'] = '0';

I've looked through the gotchas documentation ( http://pandas.pydata.org/pandas-docs/stable/gotchas.html ) but cannot figure out why nothing happens.

Any help will be appreciated.

If I understand right, you want something like this:

w['female'] = w['female'].map({'female': 1, 'male': 0})

(Here I convert the values to numbers instead of strings containing numbers. You can convert them to "1" and "0" , if you really want, but I'm not sure why you'd want that.)

The reason your code doesn't work is because using ['female'] on a column (the second 'female' in your w['female']['female'] ) doesn't mean "select rows where the value is 'female'". It means to select rows where the index is 'female', of which there may not be any in your DataFrame.

You can edit a subset of a dataframe by using loc:

df.loc[<row selection>, <column selection>]

In this case:

w.loc[w.female != 'female', 'female'] = 0
w.loc[w.female == 'female', 'female'] = 1
w.female.replace(to_replace=dict(female=1, male=0), inplace=True)

请参阅pandas.DataFrame.replace() 文档

轻微变化:

w.female.replace(['male', 'female'], [1, 0], inplace=True)

This should also work:

w.female[w.female == 'female'] = 1 
w.female[w.female == 'male']   = 0

You can also use apply with .get ie

w['female'] = w['female'].apply({'male':0, 'female':1}.get) :

w = pd.DataFrame({'female':['female','male','female']})
print(w)

Dataframe w :

   female
0  female
1    male
2  female

Using apply to replace values from the dictionary:

w['female'] = w['female'].apply({'male':0, 'female':1}.get)
print(w)

Result:

   female
0       1
1       0
2       1 

Note: apply with dictionary should be used if all the possible values of the columns in the dataframe are defined in the dictionary else, it will have empty for those not defined in dictionary.

This is very compact:

w['female'][w['female'] == 'female']=1
w['female'][w['female'] == 'male']=0

Another good one:

w['female'] = w['female'].replace(regex='female', value=1)
w['female'] = w['female'].replace(regex='male', value=0)

Alternatively there is the built-in function pd.get_dummies for these kinds of assignments:

w['female'] = pd.get_dummies(w['female'],drop_first = True)

This gives you a data frame with two columns, one for each value that occurs in w['female'], of which you drop the first (because you can infer it from the one that is left). The new column is automatically named as the string that you replaced.

This is especially useful if you have categorical variables with more than two possible values. This function creates as many dummy variables needed to distinguish between all cases. Be careful then that you don't assign the entire data frame to a single column, but instead, if w['female'] could be 'male', 'female' or 'neutral', do something like this:

w = pd.concat([w, pd.get_dummies(w['female'], drop_first = True)], axis = 1])
w.drop('female', axis = 1, inplace = True)

Then you are left with two new columns giving you the dummy coding of 'female' and you got rid of the column with the strings.

Using Series.map with Series.fillna

If your column contains more strings than only female and male , Series.map will fail in this case since it will return NaN for other values.

That's why we have to chain it with fillna :

Example why .map fails :

df = pd.DataFrame({'female':['male', 'female', 'female', 'male', 'other', 'other']})

   female
0    male
1  female
2  female
3    male
4   other
5   other
df['female'].map({'female': '1', 'male': '0'})

0      0
1      1
2      1
3      0
4    NaN
5    NaN
Name: female, dtype: object

For the correct method, we chain map with fillna , so we fill the NaN with values from the original column:

df['female'].map({'female': '1', 'male': '0'}).fillna(df['female'])

0        0
1        1
2        1
3        0
4    other
5    other
Name: female, dtype: object

There is also a function in pandas called factorize which you can use to automatically do this type of work. It converts labels to numbers: ['male', 'female', 'male'] -> [0, 1, 0] . See this answer for more information.

w.replace({'female':{'female':1, 'male':0}}, inplace = True)

上面的代码将 'female' 替换为 1,'male' 替换为 0,仅在 'female' 列中

w.female = np.where(w.female=='female', 1, 0)

if someone is looking for a numpy solution. This is useful to replace values based on a condition. Both if and else conditions are inherent in np.where() . The solutions that use df.replace() may not be feasible if the column included many unique values in addition to 'male' , all of which should be replaced with 0 .

Another solution is to use df.where() and df.mask() in succession. This is because neither of them implements an else condition.

w.female.where(w.female=='female', 0, inplace=True) # replace where condition is False
w.female.mask(w.female=='female', 1, inplace=True) # replace where condition is True

I think that in answer should be pointed which type of object do you get in all methods suggested above: is it Series or DataFrame.

When you get column by w.female. or w[[2]] (where, suppose, 2 is number of your column) you'll get back DataFrame. So in this case you can use DataFrame methods like .replace .

When you use .loc or iloc you get back Series, and Series don't have .replace method, so you should use methods like apply , map and so on.

dic = {'female':1, 'male':0}
w['female'] = w['female'].replace(dic)

.replace has as argument a dictionary in which you may change and do whatever you want or need.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM