简体   繁体   中英

Python - how to read Sharepoint excel sheet specific worksheet

In Python I am utilizing Office 365 REST Python Client library to access and read an excel workbook that contains many sheets.

While the authentication is successful, I am unable to append the right path of sheet name to the file name in order to access the 1st or 2nd worksheet by its name, which is why the output from the sheet is not JSON , rather IO Bytes which my code is not able to process.

My end goal is to simply access the specific work sheet by its name 'employee_list' and transform it into JSON or Pandas Data frame for further usage.

Code snippet below -

import io
import json
import pandas as pd
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.runtime.auth.user_credential import UserCredential
from office365.runtime.http.request_options import RequestOptions
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
from io import BytesIO


username = 'abc@a.com'
password = 'abcd'
site_url = 'https://sample.sharepoint.com/sites/SAMPLE/_layouts/15/Doc.aspx?OR=teams&action=edit&sourcedoc={739271873}'      
# HOW TO ACCESS WORKSHEET BY ITS NAME IN ABOVE LINE

ctx = ClientContext(site_url).with_credentials(UserCredential(username, password))
request = RequestOptions("{0}/_api/web/".format(site_url))
response = ctx.execute_request_direct(request)
json_data = json.loads(response.content) # ERROR ENCOUNTERED JSON DECODE ERROR SINCE DATA IS IN BYTES

You can access it by sheet index, check the following code....

import xlrd
  
loc = ("File location") 

wb = xlrd.open_workbook(loc) 
sheet = wb.sheet_by_index(0) 

# For row 0 and column 0 
print(sheet.cell_value(1, 0))

您可以尝试像这样将组件“sheetname”添加到 url。

https://site/lib/workbook.xlsx#'Sheet1'!A1

It seems that URL constructed to access data is not correct. You should test full URL in your browser as working and then modify code to get going. You may try this with some changes, I have verified that URL formed with this logic would return JSON data.

import io
import json
import pandas as pd
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.runtime.auth.user_credential import UserCredential
from office365.runtime.http.request_options import RequestOptions
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
from io import BytesIO


username = 'abc@a.com'
password = 'abcd'
site_url = 'https://sample.sharepoint.com/_vti_bin/ExcelRest.aspx/RootFolder/ExcelFileName.xlsx/Model/Ranges('employee_list!A1%7CA10')?$format=json'      
# Replace RootFolder/ExcelFileName.xlsx with actual path of excel file from the root.
# Replace A1 and A10 with actual start and end of cell range.

ctx = ClientContext(site_url).with_credentials(UserCredential(username, password))
request = RequestOptions(site_url)
response = ctx.execute_request_direct(request)
json_data = json.loads(response.content) 
    

Source: https://docs.microsoft.com/en-us/sharepoint/dev/general-development/sample-uri-for-excel-services-rest-api

The update I'm using ( Office365-REST-Python-Client==2.3.11 ) allows simpler access to an Excel file in the SharePoint repository.

# from original_question import pd,\
#                               username,\
#                               password,\
#                               UserCredential,\
#                               File,\
#                               BytesIO

user_credentials = UserCredential(user_name=username, 
                                  password=password)

file_url = ('https://sample.sharepoint.com'
            '/sites/SAMPLE/{*recursive_folders}'
            '/sample_worksheet.xlsx') 
    ## absolute path of excel file on SharePoint

excel_file = BytesIO() 
    ## initiating binary object

excel_file_online = File.from_url(abs_url=file_url)
    ## requesting file from SharePoint

excel_file_online = excel_file_online.with_credentials(
    credentials=user_credentials)
        ## validating file with accessible credentials

excel_file_online.download(file_object=excel_file).execute_query()
    ## writing binary response of the 
    ## file request into bytes object

We now have a binary copy of the Excel file as BytesIO named excel_file . Progressing, reading it as pd.DataFrame is straight-forward like usual Excel file stored in local drive. Eg.:

pd.read_excel(excel_file) # -> pd.DataFrame

Hence, if you are interested in a specific sheet like 'employee_list' , you may preferably read it as

employee_list = pd.read_excel(excel_file,
                              sheet_name='employee_list')
    # -> pd.DataFrame

or

data = pd.read_excel(excel_file,
                     sheet_name=None) # -> dict
employee_list = data.get('employee_list') 
    # -> [pd.DataFrame, None]

I know you stated you can't use a BytesIO object, but for those coming here who are reading the file in as a BytesIO object like I was looking for, you can use the sheet_name arg in pd.read_excel :

    url = "https://sharepoint.site.com/sites/MySite/MySheet.xlsx"
    sheet_name = 'Sheet X'
    response = File.open_binary(ctx, relative_url)
    bytes_file_obj = io.BytesIO()
    bytes_file_obj.write(response.content)
    bytes_file_obj.seek(0)
    df = pd.read_excel(bytes_file_obj, sheet_name = sheet_name)  //call sheet name

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM