简体   繁体   English

如何使用 Python 从 Excel 表中提取 OLE 对象?

[英]How to extract OLE objects from Excel table using Python?

I would like to use Python to extract OLE-objects from an Excel table into the Windows clipboard.我想使用 Python 将 Excel 表中的 OLE 对象提取到 Windows 剪贴板中。

This post didn't help further since it is for VBA.这篇文章没有进一步帮助,因为它是针对 VBA 的。 And this post is still unanswered.这个帖子仍然没有得到答复。

Assuming the given Excel table (with ChemDraw or ChemSketch OLE objects):假设给定的 Excel 表(带有 ChemDraw 或 ChemSketch OLE 对象):

在此处输入图片说明

There are some Python modules which can handle Excel files, eg openpyxl , xlrd .有一些 Python 模块可以处理 Excel 文件,例如openpyxlxlrd The module win32clipboard can put data into the clipboard. win32clipboard模块可以将数据放入剪贴板。

My Problems:我的问题:

  1. I don't see how to get the embedded OLE object to the clipboard.我不知道如何将嵌入的 OLE 对象添加到剪贴板。 Probably, openpyxl and xlrd together with win32clipboard are not suited for this?可能openpyxlxlrd以及win32clipboard不适合这个?
  2. There is a Python module oletools which maybe can do it but I don't understand how it works.有一个 Python 模块oletools也许可以做到,但我不明白它是如何工作的。 https://pypi.org/project/oletools/ https://pypi.org/project/oletools/

From this page:从这个页面:

oleobj : to extract embedded objects from OLE files. oleobj : 从 OLE 文件中提取嵌入的对象。

This seems to be exactly what I am looking for, however, I couldn't find any MCVEs.这似乎正是我正在寻找的,但是,我找不到任何 MCVE。 And unfortunately, the documentation of oleobj is basically reduced to: "read the source code and find out yourself".而且不幸的是, oleobj的文档基本上都简化为:“阅读源代码并找出自己”。 I would be grateful for hints and assistance.我将不胜感激提示和帮助。

My code so far:到目前为止我的代码:

### trying to extract OLE objects from Excel table into clipboard
from openpyxl import load_workbook
import win32clipboard as clpbd

def set_clipboard(data):
    clpbd.OpenClipboard()
    clpbd.EmptyClipboard()
    clpbd.SetClipboardText(data)    # I'm aware, this is only for text, is there anything for OLEs?
    clpbd.CloseClipboard()

def print_clipboard():
    clpbd.OpenClipboard()
    data = clpbd.GetClipboardData()
    clpbd.CloseClipboard()
    print(data)

wb = load_workbook(filename = 'tbChemOLE.xlsx')
ws = wb.active

myName = ws['A3'].value    # result: napthalene
myImage = ws['B3'].value   # result: None
myObject = ws['C3'].value  # result: None

set_clipboard(myName)
print_clipboard()          # result: Naphtalene
# set_clipboard(myImage)   # crash, because myImage is None
print_clipboard()     
# set_clipboard(myObject)  # crash, because myObject is None
print_clipboard()        

wb.close()
### end of code

I built a python module to do exactly this check it out over here.我构建了一个 python 模块来完成这个检查。 https://pypi.org/project/AttachmentsExtractor/ also the module can be run on any os. https://pypi.org/project/AttachmentsExtractor/该模块也可以在任何操作系统上运行。

after installing the library use the following code snippet:安装库后,使用以下代码片段:

 from AttachmentsExtractor import extractor
            
 abs_path_to_file='Please provide absolute path here '
 path_to_destination_directory = 'Please provide path of the directory where the extracted attachments should be stored'
 extractor.extract(abs_path_to_file,path_to_destination_directory) # returns true if one or more attachments are found else returns false.

In the meantime I found this post , where the OP actually didn't want the OLE objects on the clipboard, but for me it is fine.与此同时,我发现了这篇文章,其中 OP 实际上不想要剪贴板上的 OLE 对象,但对我来说这很好。 Actually, no need for openpyxl or xlrd , but win32com.client is required.实际上,不需要openpyxlxlrd ,但需要win32com.client

I can get all OLE objects, however, they are indexed (probably) in the sequence of their addition.我可以获取所有 OLE 对象,但是,它们按添加顺序(可能)编入索引。 So I need to create a dictionary with the row index as key and a set of OLE object index of and name as value.所以我需要创建一个字典,以行索引为键,一组 OLE 对象索引和名称为值。

Code:代码:

### copy OLE object in certain cell to clipboard
import win32com.client as win32
import win32clipboard

excel = win32.gencache.EnsureDispatch('Excel.Application')
ffname = r'C:\Test\tbChemOLE.xlsx'
wb = excel.Workbooks.Open(ffname)
ws = wb.Worksheets.Item(1)
objs = ws.OLEObjects()

def get_all_OLEs():
    oleNo_dict = {}     # dictionary for all OLE objects
    for i in range(1,len(objs)+1):    # loop all OLE objects
        obj = objs.Item(i) 
        myRow = obj.TopLeftCell.Row        # row of OLE object
        myName = ws.Cells(myRow,1).Value   # corresponding name
        oleNo_dict[myRow] = (i, myName)
    return oleNo_dict

def get_OLE(row):
    try: 
        objs[oleNo_dict[row][0]].Copy()
        win32clipboard.OpenClipboard()
        data = win32clipboard.GetClipboardData(0xC004) # Binary access
        win32clipboard.CloseClipboard()
    except Exception as e:
        print(e)
        win32clipboard.OpenClipboard()
        win32clipboard.EmptyClipboard()
        win32clipboard.CloseClipboard()
    return oleNo_dict[row]
    # and OLE is on clipboard if found

oleNo_dict = get_all_OLEs()

row = 4
myMolecule = get_OLE(row)
print(myMolecule[1], "OLE object is now on the clipboard.")

wb.Close()
excel.Application.Quit()
### end of code

Result:结果:

Anthracene OLE object is now on the clipboard.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 python 模块从具有对象 excel 工作表的 word 文档中提取段落和表格中的文本? - How to extract text from paragraphs and table using python module from word document having objects excel sheet? 如何使用 python 从网站中提取表格 - How to extract table from website using python 如何使用 python 从网站中提取表格? - How to extract a table from the website using python? 如何使用python从excel中的特定列中提取不可见的注释 - How to extract invisible comments from specific columns in excel using python 如何使用python从Excel工作表中提取超链接单元格 - How to extract the hyperlink cell from excel sheet using python 如何使用openpyxl提取excel中的表格 - How to extract the table in excel using openpyxl 使用Python从Excel中提取列 - Extract columns from Excel using Python 如何使用python从数据库中提取表元数据 - How do I extract table metadata from a database using python 如何在Python中使用BeautifulSoup从HTML页面提取表内容? - How to extract Table contents from an HTML page using BeautifulSoup in Python? 如何使用 python docx 从多个文件中提取 Word 表 - How to extract a Word table from multiple files using python docx
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM