简体   繁体   中英

df = pd.read_csv('iris.csv') pointing to file in azure blob reports [errno 2] no such file or directory

I'm working my way through the instructions posted at https://docs.microsoft.com/en-us/azure/batch/tutorial-run-python-batch-azure-data-factory#:~:text=%20Tutorial%3A%20Run%20Python%20scripts%20through%20Azure%20Data,create%20the%20Batch%20pool%20that%20your...%20More%20

That site provide some python code.

I'm trying to get that code to work.

Here is my code so far.

'''

from azure.storage.blob import ContainerClient
from azure.storage.blob import BlobServiceClient
import pandas as pd
containerName      = "output"
storageAccountURL= "<URL to my storage account>"
storageKey= "<storage key >"

    
# Establish connection with the blob storage account
blob_service_client = BlobServiceClient(account_url=storageAccountURL,
                                   credential=storageKey

# Load iris dataset from the task node
df = pd.read_csv('iris.csv')
# Subset records
df = df[df['Species'] == "setosa"]
# Save the subset of the iris dataframe locally in task node
df.to_csv("iris_setosa.csv", index = False)
# Upload iris dataset
container_client = blob_service_client.get_container_client(containerName)
with open("iris_setosa.csv", "rb") as data:
blob_client = container_client.upload_blob(name="iris_setosa.csv", data=data)  
'''

When I run the above code ( in visual studio 2019, on my laptop ), the line df = pd.read_csv('iris.csv') causes the following error;- [errno 2] No such file or directory:'iris.csv'

Now, I have checked that the blob container input does in fact contain that file. I can see it and I can even download it, open it, etc.

I have tried various changes to the code;- changed the containerName from output to input ( because the instructions on the web prior to to code mentions that input folder. No improvement.

I tried various changes to the file name. ..

df = pd.read_csv('/iris.csv')
df = pd.read_csv('input/iris.csv')
df = pd.read_csv(<the files hyper-link >) 

no luck

I've read the various posts about fixing the import commands dependent on different azure python sdk and library versions, and the imports work OK, but I cannot solve the above pd.read_cvs problem.

To add to the above information...

I'm using python 3.7,

and the following python libraries;-

azure-batch==10.0.0 
azure-cognitiveservices-vision-customvision==3.1.0 
azure-common==1.1.26 
azure-core==1.10.0
 pandas==1.2.1

and other packages.

Can you help by telling me how to get this python code to work?

many thanks

Logically speaking, if you're certain that the file is there, and based on the fact that it seems like you're running Python through Azure, the issure here is that your current working directory (cwd) does not match the "default" working directory that the tutorial expects. Considering how the tutorial is written, they expect you to be working from the input container. Therefore, to resolve the issue, you need to correctly find the path to your file or change your cwd.

Within Python, you can check what your cwd is using:

import os

os.getcwd()

I advise you to do that and from there you should be able to find the path to your file. You could also just change your working directory using os.chdir(path) where you replace path with the path to the input folder where the dataset is stored.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM