简体   繁体   English

使用 python pandas 读取 sharepoint excel 文件

[英]Read sharepoint excel file with python pandas

"I'm trying to use this code from How to read SharePoint Online (Office365) Excel files into Python specifically pandas with Work or School Account? answers but a get the XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n<!DOCT'. I think the problem is in the way im placing my path. Do anybody knows how to get this type of Sharepoint path, like in the example below?" “我正在尝试使用如何将 SharePoint Online (Office365) Excel 文件中的代码读入 Python,特别是带有工作或学校帐户的熊猫?答案但得到 XLRDError:不支持的格式或损坏的文件:预期的 BOF 记录;找到 b '\r\n<!DOCT'。我认为问题在于我放置路径的方式。有人知道如何获得这种类型的 Sharepoint 路径,如下例所示? The ones I get look more like this "https://company.sharepoint.com/sites/site/Shared%20Documents/Forms/AllItems.aspx"我得到的那些看起来更像这样“https://company.sharepoint.com/sites/site/Shared%20Documents/Forms/AllItems.aspx”

#import all the libraries
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File 
import io
import pandas as pd

#target url taken from sharepoint and credentials
url = 'https://company.sharepoint.com/Shared%20Documents/Folder%20Number1/Folder%20Number2/Folder3/Folder%20Number4/Target_Excel_File_v4.xlsx?cid=_Random_letters_and_numbers-21dbf74c'
username = 'Dumby_account@company.com'
password = 'Password!'

ctx_auth = AuthenticationContext(url)
if ctx_auth.acquire_token_for_user(username, password):
  ctx = ClientContext(url, ctx_auth)
  web = ctx.web
  ctx.load(web)
  ctx.execute_query()
  print("Authentication successful")

response = File.open_binary(ctx, url)

#save data to BytesIO stream
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0) #set file object to start

#read excel file and each sheet into pandas dataframe 
df = pd.read_excel(bytes_file_obj, sheetname = None)

I did it by opening the file in desktop and going to file > info > Copy Path.我通过在桌面中打开文件并转到文件>信息>复制路径来做到这一点。 This path should work.这条路应该行得通。

Looks like you are using the share link instead of file path.看起来您使用的是共享链接而不是文件路径。 You need to copy the correct path.您需要复制正确的路径。 Here's how:就是这样:

  1. Open the sharepoint folder打开共享点文件夹
  2. Click on the 3 dots in the file and click on Details单击文件中的 3 个点,然后单击详细信息
  3. Scroll down and copy the Path the path should look something like: '/user/folder/Documents/Target_Excel_File_v4.xlsx'向下滚动并复制路径,路径应类似于:'/user/folder/Documents/Target_Excel_File_v4.xlsx'

Use the sharepoint url to authenticate and then use the copied path to open your binary file.使用 sharepoint url 进行身份验证,然后使用复制的路径打开您的二进制文件。

#import all the libraries
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File 
import io
import pandas as pd

#target url taken from sharepoint and credentials
url = 'https://company.sharepoint.com/user/folder'
path = '/user/folder/Documents/Target_Excel_File_v4.xlsx'
username = 'Dumby_account@company.com'
password = 'Password!'

ctx_auth = AuthenticationContext(url)
if ctx_auth.acquire_token_for_user(username, password):
  ctx = ClientContext(url, ctx_auth)
  web = ctx.web
  ctx.load(web)
  ctx.execute_query()
  print("Authentication successful")

response = File.open_binary(ctx, path)

#save data to BytesIO stream
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0) #set file object to start

#read excel file and each sheet into pandas dataframe 
df = pd.read_excel(bytes_file_obj, sheet_name = None)
print(df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM