简体   繁体   English

graphlab从现有sframe添加变量列

[英]graphlab adding variable columns from existing sframe

I have a SFrame eg 我有一个SFrame例如

a | b
-----
2 | 31 4 5
0 | 1 9
1 | 2 84

now i want to get following result 现在我想得到以下结果

a | b      | c | d | e
----------------------
2 | 31 4 5 | 31|4  | 5
0 | 1 9    | 1 | 9 | 0
1 | 2 84   | 2 | 84 | 0

any idea how to do it? 知道怎么做吗? or maybe i have to use some other tools? 还是我必须使用其他一些工具?

thanks 谢谢

Using pandas: 使用熊猫:

In [409]: sf
Out[409]: 
Columns:
    a   int
    b   str

Rows: 3

Data:
+---+--------+
| a |   b    |
+---+--------+
| 2 | 31 4 5 |
| 0 |  1 9   |
| 1 |  2 84  |
+---+--------+
[3 rows x 2 columns]

In [410]: df = sf.to_dataframe()

In [411]: newdf =  pd.DataFrame(df.b.str.split().tolist(), columns = ['c', 'd', 'e']).fillna('0')

In [412]: df.join(newdf)
Out[412]: 
   a       b   c   d  e
0  2  31 4 5  31   4  5
1  0     1 9   1   9  0
2  1    2 84   2  84  0

Converting back to SFrame: 转换回SFrame:

In [498]: SFrame(df.join(newdf))
Out[498]: 
Columns:
    a   int
    b   str
    c   str
    d   str
    e   str

Rows: 3

Data:
+---+--------+----+----+---+
| a |   b    | c  | d  | e |
+---+--------+----+----+---+
| 2 | 31 4 5 | 31 | 4  | 5 |
| 0 |  1 9   | 1  | 9  | 0 |
| 1 |  2 84  | 2  | 84 | 0 |
+---+--------+----+----+---+
[3 rows x 5 columns]

If you want integers/floats, you can also do: 如果需要整数/浮点数,还可以执行以下操作:

In [506]: newdf =  pd.DataFrame(map(lambda x: [int(y) for y in x], df.b.str.split().tolist()), columns = ['c', 'd', 'e'])

In [507]: newdf
Out[507]: 
    c   d    e
0  31   4  5.0
1   1   9  NaN
2   2  84  NaN

In [508]: SFrame(df.join(newdf))
Out[508]: 
Columns:
    a   int
    b   str
    c   int
    d   int
    e   float

Rows: 3

Data:
+---+--------+----+----+-----+
| a |   b    | c  | d  |  e  |
+---+--------+----+----+-----+
| 2 | 31 4 5 | 31 | 4  | 5.0 |
| 0 |  1 9   | 1  | 9  | nan |
| 1 |  2 84  | 2  | 84 | nan |
+---+--------+----+----+-----+
[3 rows x 5 columns]
def customsplit(string,column):
    val = string.split(' ')
    diff = column - len(val)
    val += ['0']*diff
    return val 

a  =  sf['b'].apply(lambda x: customsplit(x,3))
sf['c'] = [i[0] for i in a]
sf['d'] = [i[1] for i in a]
sf['e'] = [i[2] for i in a]

sf

Output: 输出:

a | b      | c | d | e
----------------------
2 | 31 4 5 | 31|4  | 5
0 | 1 9    | 1 | 9 | 0
1 | 2 84   | 2 | 84 | 0

This can be done by SFrame itself not using Pandas. 可以通过SFrame本身不使用Pandas来完成。 Just utilize ' unpack ' function. 只需利用“ 解压 ”功能。

Pandas provides a variety of functions to handle dataset, but it is inconvenient to convert SFrame to Pandas DataFrame and vice versa. Pandas提供了多种功能来处理数据集,但是将SFrame转换为Pandas DataFrame并不方便,反之亦然。

If you handles over 10 Giga bytes data, Pandas can not properly handle the dataset. 如果处理超过10 Giga字节的数据,熊猫将无法正确处理数据集。 (But SFrame can) (但是SFrame可以)

# your SFrame
sf=sframe.SFrame({'a' : [2,0,1], 'b' : [[31,4,5],[1,9,],[2,84,]]})

# just use 'unpack()' function
sf2= sf.unpack('b')

# change the column names
sf2.rename({'b.0':'c', 'b.1':'d', 'b.2':'e'})

# filling-up the missing values to zero
sf2 = sf2['e'].fillna(0)

# merge the original SFrame and new SFrame
sf.join(sf2, 'a')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM