So I'm just starting with Azure and I have this problem:
Here is my code:
def getWorkspace(name):
ws = Workspace.get(
name=name,
subscription_id= sid,
resource_group='my_ressource',
location='my_location')
return ws
def uploadDataset(ws, file, separator=','):
datastore = Datastore.get_default(ws)
path = DataPath(datastore=datastore,path_on_datastore=file)
dataset = TabularDatasetFactory.from_delimited_files(path=path, separator=separator)
#dataset = Dataset.Tabular.from_delimited_files(path=path, separator=separator)
print(dataset.to_pandas_dataframe().head())
print(type(dataset))
ws = getWorkspace(workspace_name)
uploadDataset(ws, my_csv,";")
#result :
fixed_acidity volatile_acidity citric_acid residual_sugar chlorides ... density pH sulphates alcohol quality0 7.5 0.33 0.32 11.1 0.036 ... 0.99620 3.15 0.34 10.5 61 6.3 0.27 0.29 12.2 0.044 ... 0.99782 3.14 0.40 8.8 62 7.0 0.30 0.51 13.6 0.050 ... 0.99760 3.07 0.52 9.6 73 7.4 0.38 0.27 7.5 0.041 ... 0.99535 3.17 0.43 10.0 54 8.1 0.12 0.38 0.9 0.034 ... 0.99026 2.80 0.55 12.0 6
[5 rows x 12 columns]
<class 'azureml.data.tabular_dataset.TabularDataset'>
But when I go to Microsoft Azure Machine Learning Studio in datasets, this dataset isn't created. What am I doing wrong?
Firstly we need to check the format of the file, if the format is .csv
or .tsv we need to use from_delimited_files()
method which has TabularDataSetFactory
class to read files. Or else if we have .paraquet
files we have a method called as from_parquet_files()
. Along with these we have register_pandas_dataframe()
method which registers the TabularDataset to the workspace and uploads data to your underlying storage
Also for the storage is there is any virtual network or firewalls enabled then make sure that we set a parameter as validate=False in from_delimited_files()
method as this will skip the validation/verification step.
Specify the datastore name as below along with Workspace:
datastore_name = 'your datastore name'
workspace = Workspace.from_config() #if we have existing work space.
datastore = Datastore.get(workspace, datastore_name)
Below is the way to create TabularDataSets from 3 file paths.
datastore_paths = [(datastore, 'weather/2018/11.csv'),
(datastore, 'weather/2018/12.csv'),
(datastore, 'weather/2019/*.csv')]
Create_TBDS = Dataset.Tabular.from_delimited_files(path=datastore_paths)
If we want to specify the separator, we can do it as below:
Create_TBDS = Dataset.Tabular.from_delimited_files(path=datastore_paths, separator=',')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.