简体   繁体   中英

How does one call external datasets into scikit-learn?

For example consider this dataset:

(1) https://archive.ics.uci.edu/ml/machine-learning-databases/annealing/anneal.data

Or

(2) http://data.worldbank.org/topic

How does one call such external datasets into scikit-learn to do anything with it?


The only kind of dataset calling that I have seen in scikit-learn is through a command like:

from sklearn.datasets import load_digits

digits = load_digits()

You need to learn a little pandas , which is a data frame implementation in python. Then you can do

import pandas
my_data_frame = pandas.read_csv("/path/to/my/data")

To create model matrices from your data frame, I recommend the patsy library, which implements a model specification language, similar to R formulas

import patsy
model_frame = patsy.dmatrix("my_response ~ my_model_fomula", my_data_frame)

then the model frame can be passed in as an X into the various sklearn models.

Simply run the following command and replace the name 'EXTERNALDATASETNAME' with the name of your dataset

import sklearn.datasets 
data = sklearn.datasets.fetch_EXTERNALDATASETNAME()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM