简体   繁体   中英

How to move from Graphlab to pandas

I've been learning Graphlab, but wanted to take a look at pandas as well since it's open source and in the future I might find myself at a company that doesn't have a GL license, and I was wondering how pandas would handle creating a basic model the way I can with GL.

data = pd.read_csv("~/Downloads/diamonds.csv")
sframe  = gl.SFrame(data)
train_data, test_data = sframe.random_split(.8, seed=1)
train, test = train_test_split(data, train_size=0.75, random_state=88)
reg_model = gl.linear_regression.create(train_data, target="price", features=["carat","cut","color"], validation_set=None)

What would be the pandas equivalent of the last line above?

pandas itself doesn't have any predictive modeling built in (that i know of). Here is a good link on how to leverage pandas in a statistical model. This one too.

pandas is probably one of the best (if not the best) modules for data manipulation in Python. It'll make storing data and manipulating the data for modeling much easier than lists and reading CSVs, etc.

Reading in files is as easy as (notice how intuitive it is):

import pandas as pd
# Excel
df1 = read_excel(PATH_HERE)
# Csv
df1 = read_csv(PATH_HERE)
# JSON
df1 = read_json(PATH_HERE)

and to spit it out:

# Excel
d1.to_excel(PATH_HERE)
# Need I go on again??

It also makes filtering and slicing your data very simple. Here is the official doc :

For modeling purposes have a look at sklearn and NLTK for text analysis. There are others, but those are the ones I've used.

For modelling, you have to use sklearn library. The last line equivalent is:

model = sklearn.linear_model.LogisticRegression()
model.fit(train_data["carat","cut","color"], train_data["price"])

docs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM