简体   繁体   English

替换 Pandas DataFrame 中的列值

[英]Replacing column values in a pandas DataFrame

I'm trying to replace the values in one column of a dataframe.我正在尝试替换数据帧的一列中的值。 The column ('female') only contains the values 'female' and 'male'.列 ('female') 仅包含值 'female' 和 'male'。

I have tried the following:我尝试了以下方法:

w['female']['female']='1'
w['female']['male']='0' 

But receive the exact same copy of the previous results.但收到与先前结果完全相同的副本。

I would ideally like to get some output which resembles the following loop element-wise.理想情况下,我希望获得一些类似于以下循环元素的输出。

if w['female'] =='female':
    w['female'] = '1';
else:
    w['female'] = '0';

I've looked through the gotchas documentation ( http://pandas.pydata.org/pandas-docs/stable/gotchas.html ) but cannot figure out why nothing happens.我已经查看了 gotchas 文档( http://pandas.pydata.org/pandas-docs/stable/gotchas.html ),但无法弄清楚为什么什么也没发生。

Any help will be appreciated.任何帮助将不胜感激。

If I understand right, you want something like this:如果我理解正确,你想要这样的东西:

w['female'] = w['female'].map({'female': 1, 'male': 0})

(Here I convert the values to numbers instead of strings containing numbers. You can convert them to "1" and "0" , if you really want, but I'm not sure why you'd want that.) (这里我将值转换为数字而不是包含数字的字符串。如果您真的需要,您可以将它们转换为"1""0" ,但我不确定您为什么想要那样。)

The reason your code doesn't work is because using ['female'] on a column (the second 'female' in your w['female']['female'] ) doesn't mean "select rows where the value is 'female'".原因您的代码不工作是因为使用['female']柱(第二'female'在你的w['female']['female']并不意味着“选择行,其中的值是'女性'”。 It means to select rows where the index is 'female', of which there may not be any in your DataFrame.这意味着选择索引为“女性”的行,其中在您的 DataFrame 中可能没有任何行。

You can edit a subset of a dataframe by using loc:您可以使用 loc 编辑数据帧的子集:

df.loc[<row selection>, <column selection>]

In this case:在这种情况下:

w.loc[w.female != 'female', 'female'] = 0
w.loc[w.female == 'female', 'female'] = 1
w.female.replace(to_replace=dict(female=1, male=0), inplace=True)

请参阅pandas.DataFrame.replace() 文档

轻微变化:

w.female.replace(['male', 'female'], [1, 0], inplace=True)

This should also work:这也应该有效:

w.female[w.female == 'female'] = 1 
w.female[w.female == 'male']   = 0

You can also use apply with .get ie您还可以将apply.get一起使用,即

w['female'] = w['female'].apply({'male':0, 'female':1}.get) : w['female'] = w['female'].apply({'male':0, 'female':1}.get) :

w = pd.DataFrame({'female':['female','male','female']})
print(w)

Dataframe w :数据框w

   female
0  female
1    male
2  female

Using apply to replace values from the dictionary:使用apply替换字典中的值:

w['female'] = w['female'].apply({'male':0, 'female':1}.get)
print(w)

Result:结果:

   female
0       1
1       0
2       1 

Note: apply with dictionary should be used if all the possible values of the columns in the dataframe are defined in the dictionary else, it will have empty for those not defined in dictionary.注意:如果数据框中列的所有可能值都在字典中定义,则应使用字典apply ,否则字典中未定义的值将为空。

This is very compact:这是非常紧凑的:

w['female'][w['female'] == 'female']=1
w['female'][w['female'] == 'male']=0

Another good one:另一个不错的:

w['female'] = w['female'].replace(regex='female', value=1)
w['female'] = w['female'].replace(regex='male', value=0)

Alternatively there is the built-in function pd.get_dummies for these kinds of assignments:或者,对于这些类型的赋值,有内置函数 pd.get_dummies:

w['female'] = pd.get_dummies(w['female'],drop_first = True)

This gives you a data frame with two columns, one for each value that occurs in w['female'], of which you drop the first (because you can infer it from the one that is left).这为您提供了一个包含两列的数据框,一个用于 w['female'] 中出现的每个值,您删除其中的第一列(因为您可以从剩下的列中推断出它)。 The new column is automatically named as the string that you replaced.新列将自动命名为您替换的字符串。

This is especially useful if you have categorical variables with more than two possible values.如果您有具有两个以上可能值的分类变量,这将特别有用。 This function creates as many dummy variables needed to distinguish between all cases.该函数创建了区分所有情况所需的尽可能多的虚拟变量。 Be careful then that you don't assign the entire data frame to a single column, but instead, if w['female'] could be 'male', 'female' or 'neutral', do something like this:请注意,不要将整个数据框分配给单个列,而是如果 w['female'] 可以是 'male'、'female' 或 'neutral',请执行以下操作:

w = pd.concat([w, pd.get_dummies(w['female'], drop_first = True)], axis = 1])
w.drop('female', axis = 1, inplace = True)

Then you are left with two new columns giving you the dummy coding of 'female' and you got rid of the column with the strings.然后你会留下两个新的列,给你“女性”的虚拟编码,你摆脱了带有字符串的列。

Using Series.map with Series.fillna使用Series.mapSeries.fillna

If your column contains more strings than only female and male , Series.map will fail in this case since it will return NaN for other values.如果您的列包含的字符串多于femalemale ,则在这种情况下Series.map将失败,因为它会为其他值返回NaN

That's why we have to chain it with fillna :这就是为什么我们必须用fillna链接它:

Example why .map fails : .map失败的示例

df = pd.DataFrame({'female':['male', 'female', 'female', 'male', 'other', 'other']})

   female
0    male
1  female
2  female
3    male
4   other
5   other
df['female'].map({'female': '1', 'male': '0'})

0      0
1      1
2      1
3      0
4    NaN
5    NaN
Name: female, dtype: object

For the correct method, we chain map with fillna , so we fill the NaN with values from the original column:对于正确的方法,我们将mapfillna ,因此我们用原始列中的值填充NaN

df['female'].map({'female': '1', 'male': '0'}).fillna(df['female'])

0        0
1        1
2        1
3        0
4    other
5    other
Name: female, dtype: object

There is also a function in pandas called factorize which you can use to automatically do this type of work.还有一个功能pandasfactorize ,你可以用它来自动执行此类型的工作。 It converts labels to numbers: ['male', 'female', 'male'] -> [0, 1, 0] .它将标签转换为数字: ['male', 'female', 'male'] -> [0, 1, 0] See this answer for more information.有关更多信息,请参阅答案。

w.replace({'female':{'female':1, 'male':0}}, inplace = True)

上面的代码将 'female' 替换为 1,'male' 替换为 0,仅在 'female' 列中

w.female = np.where(w.female=='female', 1, 0)

if someone is looking for a numpy solution.如果有人正在寻找一个 numpy 解决方案。 This is useful to replace values based on a condition.这对于根据条件替换值很有用。 Both if and else conditions are inherent in np.where() . if 和 else 条件都是np.where()固有的。 The solutions that use df.replace() may not be feasible if the column included many unique values in addition to 'male' , all of which should be replaced with 0 .如果列除'male'之外还包含许多唯一值,则使用df.replace()的解决方案可能不可行,所有这些值都应替换为0

Another solution is to use df.where() and df.mask() in succession.另一种解决方案是连续使用df.where()df.mask() This is because neither of them implements an else condition.这是因为它们都没有实现 else 条件。

w.female.where(w.female=='female', 0, inplace=True) # replace where condition is False
w.female.mask(w.female=='female', 1, inplace=True) # replace where condition is True

I think that in answer should be pointed which type of object do you get in all methods suggested above: is it Series or DataFrame.我认为在回答中应该指出您在上面建议的所有方法中获得哪种类型的对象:它是 Series 还是 DataFrame。

When you get column by w.female.当你得到w.female. or w[[2]] (where, suppose, 2 is number of your column) you'll get back DataFrame.w[[2]] (假设 2 是您的列数),您将返回 DataFrame。 So in this case you can use DataFrame methods like .replace .因此,在这种情况下,您可以使用 DataFrame 方法,例如.replace

When you use .loc or iloc you get back Series, and Series don't have .replace method, so you should use methods like apply , map and so on.当您使用.lociloc您会返回 Series,而 Series 没有.replace方法,因此您应该使用applymap等方法。

dic = {'female':1, 'male':0}
w['female'] = w['female'].replace(dic)

.replace has as argument a dictionary in which you may change and do whatever you want or need. .replace 有一个字典作为参数,您可以在其中更改并执行您想要或需要的任何操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM