简体   繁体   English

将excel从sharepoint读取到python时出现ValueError

[英]ValueError when reading excel from sharepoint to python

I am trying to read an excel file from sharepoint to python.我正在尝试从sharepoint读取一个excel文件到python。

Q1: There are two URLs for the file. Q1:该文件有两个 URL。 If I directly copy the link of the file, I get:如果我直接复制文件的链接,我会得到:

https://company.sharepoint.com/:x:/s/project/letters-numbers?e=lettersnumbers

If I click into folders from the webpage one after another, until I click and open the excel file, the URL now is:如果我从网页一个接一个地点击进入文件夹,直到我点击并打开excel文件,现在的URL是:

https://company.sharepoint.com/:x:/r/sites/project/_layouts/15/Doc.aspx?sourcedoc=letters-numbers&file=Table.xlsx&action=default&mobileredirect=true

Which one should I use?我应该使用哪一个?

Q2: My code below: Q2:我的代码如下:

import pandas as pd
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File

URL = "https://company.sharepoint.com/:x:/s/project/letters-numbers?e=lettersnumbers"
USERNAME = "abc@a.com"
PASSWORD = "abcd"

ctx_auth = AuthenticationContext(URL)
if ctx_auth.acquire_token_for_user(USERNAME, PASSWORD):
    ctx = ClientContext(URL, ctx_auth)
    web = ctx.web
    ctx.load(web)
    ctx.execute_query()
    print("Authentication successful")
else:
    print(ctx_auth.get_last_error())

response = File.open_binary(ctx, URL)
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0)
df = pd.read_excel(bytes_file_obj, sheet_name="Sheet2")

It works until the pd.read_excel() , where I get ValueError.它一直有效,直到我得到 ValueError 的pd.read_excel()

ValueError: Excel file format cannot be determined, you must specify an engine manually.

I don't know where it went wrong and if there will be further problems with loading.我不知道哪里出了问题,加载时是否会出现进一步的问题。 It will be highly appreciated if someone could warn me of the problems or leave an example.如果有人能警告我这些问题或留下一个例子,我们将不胜感激。

If you take a look at the pandas documentation for 'read_excel' ( https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html ), you'll see that there is an 'engine' parameter.如果您查看“read_excel”( https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html )的熊猫文档,您会看到有一个“引擎”参数。

Try the different options and see which one works, since your error is saying that an engine has to be specified manually.尝试不同的选项,看看哪个有效,因为您的错误是说必须手动指定引擎。

If this is correct, in the future, take the error messages literally and check the documentation如果这是正确的,将来,请按字面理解错误消息并检查文档

I have tried different URLs (and how to obtain them), and received different binary files.我尝试了不同的 URL(以及如何获取它们),并收到了不同的二进制文件。 They are either a line of code status (like 403) or warning, or something that looks like a header.它们要么是一行代码状态(如 403)或警告,要么是看起来像标题的东西。 So I believe the problem is the URL format.所以我认为问题在于 URL 格式。

Here (github.com/vgrem) I found the answer. 在这里(github.com/vgrem)我找到了答案。

It basically says that for ClientContext you need an absolute URL,它基本上说对于ClientContext你需要一个绝对 URL,

URL = "https://company.sharepoint.com/:x:/r/sites/project"

And for File you need a relative path, but with overlap with the URL:对于File ,您需要一个相对路径,但与 URL 重叠:

RELATIVE_PATH = "/sites/project/Shared%20Documents/Folder/Table.xlsx"

The RELATIVE_PATH can be found like this: RELATIVE_PATH可以这样找到:

  1. Go to the folder of the file in Teams (or on the webpage).转到 Teams(或网页上)中的文件文件夹。

  2. Choose the file, Open in app (Excel).选择文件, Open in app (Excel)。

  3. In Excel, File -> Property , copy the path and adapt to the above format .在 Excel 中, File -> Property ,复制路径并适应上述格式

  4. Replace Space with "%20" ."%20"替换Space

ctx_auth = AuthenticationContext(URL)
if ctx_auth.acquire_token_for_user(USERNAME, PASSWORD):
    ctx = ClientContext(URL, ctx_auth)
    web = ctx.web
    ctx.load(web)
    ctx.execute_query()
    print("Authentication successful")
else:
    print(ctx_auth.get_last_error())

response = File.open_binary(ctx, RELATIVE_PATH)
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0)
df = pd.read_excel(bytes_file_obj, sheet_name='Sheet2')

If the sheet_name is not specified and the original .xlsx has multiple sheets, the pd.read_excel() will generate warnings and the df here is actually a dict .如果未指定sheet_name且原始.xlsx有多个工作表,则pd.read_excel()将生成警告,此处的df实际上是一个dict

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM