Graphlab：替换Sframe中的值并进行过滤

Question

So I have this very stupid problem I have been stumbling upon for hours. 所以我有一个非常愚蠢的问题，我已经绊了好几个小时了。 I'm practicing on kaggle's Titanic ML exercice, using graphlab create. 我正在使用graphlab create在kaggle的Titanic ML练习上进行练习。

My data is as shown below : 我的数据如下所示：

Now I want to replace some values in the table. 现在，我想替换表中的一些值。 For example I want to set (as a test) the age to 38 for Pclass==1 30 for Pclass==2 and 26 for Pclass==3 例如，我想将年龄（年龄）设置为38（对于Pclass == 1），对于Pclass == 2为30，对于Pclass == 3，则为26

I have tried so many ways of doing this that I am lost. 我尝试了很多方法，以至于迷失了自己。

All I have now is : 我现在所拥有的是：

df = gl.SFrame(data)
df[(df["Pclass"]==1)] #will print the rows of the table where Pclass=1
df["Age"][df["Pclass"]==1] #will display an array containg only the column "Age" for Pclass=1

Now I am trying to use SFrame.apply properly but I'm confused. 现在，我试图正确使用SFrame.apply，但感到困惑。

I have tried 我努力了

df["Age"][df["Pclass"]==1].apply(lambda x: 38)

That returns an array with the correct values but I was not able to apply it to the SFrame. 这将返回具有正确值的数组，但我无法将其应用于SFrame。 For example, I have tried : 例如，我尝试过：

df = df["Age"][df["Pclass"]==1].apply(lambda x: 38)

But now my DataFrame has turned into a list ... (obviously) 但是现在我的DataFrame变成了一个列表...（显然）

Il have also tried : 我也尝试过：

df["Age"] = df["Age"][df["Pclass"]==1].apply(lambda x: 38)

But I get the following error : "RuntimeError: Runtime Exception. Column "__PassengerId-Survived-Pclass-Sex-Age-Fare" has different size than current columns!" 但是我收到以下错误：“ RuntimeError：运行时异常。列“ __PassengerId-Survived-Pclass-Sex-Age-Fare”的大小与当前列的大小不同！”

I'm sure the solution is pretty simple but I am too confused to find it by myself. 我确定解决方案非常简单，但是我很困惑，无法独自找到它。

Ultimately I would like something like df["Age"] = something.apply(lambda x: 38 if Pclass==1 else 30 if Pclass==2 else 26 if Pclass==3) 最终，我想要类似df [“ Age”] = something.apply（lambda x：38，如果Pclass == 1，否则30，如果Pclass == 2，否则26，如果Pclass == 3）

Thanks. 谢谢。

Answer 1

You can use alternative code as below: 您可以使用以下替代代码：

Just create a new column 'Pclass_' in the original Sframe,then you can do: 只需在原始Sframe中创建一个新列'Pclass_'，即可：

df['Pclass_'] = [1 if item == 38 else 2 if item == 30 else 3 if item == 26 else 4 for item in df['Age']]

You can use any kind of (if-else-if) conditions in the list. 您可以在列表中使用任何类型的（if-else-if）条件。

Answer 2

OK, I have spent some time on this problem and found a solution : use pandas. 好的，我花了一些时间解决这个问题，并找到了解决方案：使用熊猫。 I am used to pandas but new on Graphlab which I will not be using that much so I decided to stop wasting time on this simple problem. 我已经习惯了熊猫，但是在Graphlab上是新手，所以我不会使用太多，所以我决定不再在这个简单的问题上浪费时间。

Here is what I have done : 这是我所做的：

import pandas as pd
df2 = pd.read_csv("./train.csv")
df2.loc[(df2.Age.isnull()) & (df2["Pclass"] == 1), "Age"] = 35
df2.loc[(df2.Age.isnull()) & (df2["Pclass"] == 2), "Age"] = 30
df2.loc[(df2.Age.isnull()) & (df2["Pclass"] == 3), "Age"] = 25

And I'm done, everything works fine. 我完成了，一切正常。

Graphlab：替换Sframe中的值并进行过滤

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-04-28 09:31:47

解决方案2
0 2017-03-10 09:43:54

Graphlab：替换Sframe中的值并进行过滤

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-04-28 09:31:47

解决方案2 0 2017-03-10 09:43:54

解决方案1
2 已采纳 2017-04-28 09:31:47

解决方案2
0 2017-03-10 09:43:54