简体   繁体   English

如何从Graphlab迁移到熊猫

[英]How to move from Graphlab to pandas

I've been learning Graphlab, but wanted to take a look at pandas as well since it's open source and in the future I might find myself at a company that doesn't have a GL license, and I was wondering how pandas would handle creating a basic model the way I can with GL. 我一直在学习Graphlab,但是由于熊猫是开源的,所以我也想看看它,将来我可能会发现自己在一家没有GL许可证的公司中,我想知道熊猫如何处理创建GL的基本模型。

data = pd.read_csv("~/Downloads/diamonds.csv")
sframe  = gl.SFrame(data)
train_data, test_data = sframe.random_split(.8, seed=1)
train, test = train_test_split(data, train_size=0.75, random_state=88)
reg_model = gl.linear_regression.create(train_data, target="price", features=["carat","cut","color"], validation_set=None)

What would be the pandas equivalent of the last line above? 相当于上面最后一行的熊猫是多少?

pandas itself doesn't have any predictive modeling built in (that i know of). pandas本身没有内置任何预测模型(据我所知)。 Here is a good link on how to leverage pandas in a statistical model. 这是有关如何在统计模型中利用熊猫的良好链接 This one too. 这也是。

pandas is probably one of the best (if not the best) modules for data manipulation in Python. pandas可能是最好的一个(如果不是最好 ),用于在Python数据操作模块。 It'll make storing data and manipulating the data for modeling much easier than lists and reading CSVs, etc. 与列表和读取CSV等相比,它将使存储数据和处理数据以进行建模变得更加容易。

Reading in files is as easy as (notice how intuitive it is): 读取文件非常简单(请注意它的直观性):

import pandas as pd
# Excel
df1 = read_excel(PATH_HERE)
# Csv
df1 = read_csv(PATH_HERE)
# JSON
df1 = read_json(PATH_HERE)

and to spit it out: 并吐出来:

# Excel
d1.to_excel(PATH_HERE)
# Need I go on again??

It also makes filtering and slicing your data very simple. 它还使过滤和切片数据非常简单。 Here is the official doc : 这是官方文件

For modeling purposes have a look at sklearn and NLTK for text analysis. 出于建模目的,请查看sklearnNLTK进行文本分析。 There are others, but those are the ones I've used. 还有其他,但那些是我使用过的。

For modelling, you have to use sklearn library. 为了建模,您必须使用sklearn库。 The last line equivalent is: 最后一行等效项是:

model = sklearn.linear_model.LogisticRegression()
model.fit(train_data["carat","cut","color"], train_data["price"])

docs 文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM