繁体   English   中英

通过Excel VBA打开多个PDF文件并另存为文本文件

[英]Opening Several PDF Files through Excel VBA and Saving as Text files

我正在尝试在一个文件夹中打开几个pdf文件,并将它们另存为.txt文件。

我已经尝试了以下线程中的方法

VBA将文件夹中的多个PDF转换为文本文件

如上述线程答案中所建议,我尝试了以下代码,但失败,并显示错误“未定义用户定义的类型”

Sub ONLYConvertPDF()

    Dim AcroXApp As Acrobat.AcroApp
    Dim AcroXAVDoc As Acrobat.AcroAVDoc
    Dim AcroXPDDoc As Acrobat.AcroPDDoc

    Dim Filename As String, DFilename As String, jsObj As Object

    Filename = "C:\MyPath\MyFile.pdf"
    DFilename = "C:\MyPath\MyFile.txt"
    Set AcroXApp = CreateObject("AcroExch.App")
    AcroXApp.Show
    Set AcroXAVDoc = CreateObject("AcroExch.AVDoc")
    AcroXAVDoc.Open Filename, "Acrobat"
    Set AcroXPDDoc = AcroXAVDoc.GetPDDoc
    Set jsObj = AcroXPDDoc.GetJSObject
    jsObj.SaveAs DFilename, "com.adobe.acrobat.plain-text"

    AcroXAVDoc.Close False
    AcroXApp.Hide
    AcroXApp.Exit

End Sub

我尝试了类似的其他线程,涉及Acrobat.acroApp,Acrobat.AcroAVDoc,Acrobat.AcroPDDoc,但是在同一位置重复了同样的错误。

我还尝试了“跟随超链接”方法来打开此论坛上某个主题中建议的pdf文档,但如果要在操作文件后关闭文件,该方法似乎不起作用。 (我不知道如何关闭文件)

我添加了以下库

在此处输入图片说明

当我尝试添加a)PDFPrevHndlr 1.0类型库和b)PDFShellServer 1.0类型库(我不知道它们是否是必需的)时,出现错误“加载DLL时出错”

我需要添加什么吗? 我已经安装了Adobe Acrobat Reader DC

在此处输入图片说明

我对库,dLL等的处理不甚了解。有人可以帮忙吗? 提前非常感谢您。

Xpdf提供了一些命令行工具,其中一个是pdftotext.exe,它将pdf中的文本导出到文件中。

Private Sub PdfToText(ByVal PdfPath As String, ByVal TextPath As String)
Const PathToPdfToText As String = "" '"path\to\exe" 'add path to exe if not in windows path
With CreateObject("wscript.shell")
    .Run Chr(34) & PathToPdfToText & "pdftotext.exe" & Chr(34) & " " & Chr(34) & PdfPath & Chr(34) & " " & Chr(34) & TextPath & Chr(34), 1, 1
End With
End Sub

使用像

PdfToText "path\to\pdfdoc.pdf", "path\to\textfile.txt"

旧的阅读器控件建议(可能对某人有用)缺少SaveAs方法

在Office x86中获取Adobe Reader AciveX控件

在高架Powershell上执行以下操作:

& "$env:SystemRoot\SysWOW64\regsvr32"  "C:\Program Files (x86)\Common Files\Adobe\Acrobat\ActiveX\AcroPDFImpl.dll"

注册控件。

注册后,您可以在x86上使用“ Adob​​e PDF Reader Imp” ActiveX控件。 功劳归努巴

但是我不确定Reader是否提供saveas文本。

您需要安装Adobe Acrobat Professional! 试试这个代码。

Option Explicit
Option Private Module

Sub SavePDFAsOtherFormat(PDFPath As String, FileExtension As String)

    'Saves a PDF file as another format using Adobe Professional.

    'By Christos Samaras
    'https://myengineeringworld.net/////

    'In order to use the macro you must enable the Acrobat library from VBA editor:
    'Go to Tools -> References -> Adobe Acrobat xx.0 Type Library, where xx depends
    'on your Acrobat Professional version (i.e. 9.0 or 10.0) you have installed to your PC.

    'Alternatively you can find it Tools -> References -> Browse and check for the path
    'C:Program FilesAdobeAcrobat xx.0Acrobatacrobat.tlb
    'where xx is your Acrobat version (i.e. 9.0 or 10.0 etc.).

    Dim objAcroApp      As Acrobat.AcroApp
    Dim objAcroAVDoc    As Acrobat.AcroAVDoc
    Dim objAcroPDDoc    As Acrobat.AcroPDDoc
    Dim objJSO          As Object
    Dim boResult        As Boolean
    Dim ExportFormat    As String
    Dim NewFilePath     As String

    'Check if the file exists.
    If Dir(PDFPath) = "" Then
        MsgBox "Cannot find the PDF file!" & vbCrLf & "Check the PDF path and retry.", _
                vbCritical, "File Path Error"
        Exit Sub
    End If

    'Check if the input file is a PDF file.
    If LCase(Right(PDFPath, 3)) <> "pdf" Then
        MsgBox "The input file is not a PDF file!", vbCritical, "File Type Error"
        Exit Sub
    End If

    'Initialize Acrobat by creating App object.
    Set objAcroApp = CreateObject("AcroExch.App")

    'Set AVDoc object.
    Set objAcroAVDoc = CreateObject("AcroExch.AVDoc")

    'Open the PDF file.
    boResult = objAcroAVDoc.Open(PDFPath, "")

    'Set the PDDoc object.
    Set objAcroPDDoc = objAcroAVDoc.GetPDDoc

    'Set the JS Object - Java Script Object.
    Set objJSO = objAcroPDDoc.GetJSObject

    'Check the type of conversion.
    Select Case LCase(FileExtension)
        Case "eps": ExportFormat = "com.adobe.acrobat.eps"
        Case "html", "htm": ExportFormat = "com.adobe.acrobat.html"
        Case "jpeg", "jpg", "jpe": ExportFormat = "com.adobe.acrobat.jpeg"
        Case "jpf", "jpx", "jp2", "j2k", "j2c", "jpc": ExportFormat = "com.adobe.acrobat.jp2k"
        Case "docx": ExportFormat = "com.adobe.acrobat.docx"
        Case "doc": ExportFormat = "com.adobe.acrobat.doc"
        Case "png": ExportFormat = "com.adobe.acrobat.png"
        Case "ps": ExportFormat = "com.adobe.acrobat.ps"
        Case "rft": ExportFormat = "com.adobe.acrobat.rft"
        Case "xlsx": ExportFormat = "com.adobe.acrobat.xlsx"
        Case "xls": ExportFormat = "com.adobe.acrobat.spreadsheet"
        Case "txt": ExportFormat = "com.adobe.acrobat.accesstext"
        Case "tiff", "tif": ExportFormat = "com.adobe.acrobat.tiff"
        Case "xml": ExportFormat = "com.adobe.acrobat.xml-1-00"
        Case Else: ExportFormat = "Wrong Input"
    End Select

    'Check if the format is correct and there are no errors.
    If ExportFormat <> "Wrong Input" And Err.Number = 0 Then

        'Format is correct and no errors.

        'Set the path of the new file. Note that Adobe instead of xls uses xml files.
        'That's why here the xls extension changes to xml.
        If LCase(FileExtension) <> "xls" Then
            NewFilePath = WorksheetFunction.Substitute(PDFPath, ".pdf", "." & LCase(FileExtension))
        Else
            NewFilePath = WorksheetFunction.Substitute(PDFPath, ".pdf", ".xml")
        End If

        'Save PDF file to the new format.
        boResult = objJSO.SaveAs(NewFilePath, ExportFormat)

        'Close the PDF file without saving the changes.
        boResult = objAcroAVDoc.Close(True)

        'Close the Acrobat application.
        boResult = objAcroApp.Exit

        'Inform the user that conversion was successfully.
        MsgBox "The PDf file:" & vbNewLine & PDFPath & vbNewLine & vbNewLine & _
        "Was saved as: " & vbNewLine & NewFilePath, vbInformation, "Conversion finished successfully"

    Else

        'Something went wrong, so close the PDF file and the application.

        'Close the PDF file without saving the changes.
        boResult = objAcroAVDoc.Close(True)

        'Close the Acrobat application.
        boResult = objAcroApp.Exit

        'Inform the user that something went wrong.
        MsgBox "Something went wrong!" & vbNewLine & "The conversion of the following PDF file FAILED:" & _
        vbNewLine & PDFPath, vbInformation, "Conversion failed"

    End If

    'Release the objects.
    Set objAcroPDDoc = Nothing
    Set objAcroAVDoc = Nothing
    Set objAcroApp = Nothing

End Sub

有关更多详细信息,请参见下面的链接。

https://myengineeringworld.net/2013/03/vba-macro-to-convert-pdf-files-into.html

或者,如果您没有安装Acrobat,请尝试以下解决方案,当然,请更改代码以适合您的需求。

Sub ChangeDocsToTxtOrRTFOrHTML()
    'with export to PDF in Word 2007
    Dim fs As Object
    Dim oFolder As Object
    Dim tFolder As Object
    Dim oFile As Object
    Dim strDocName As String
    Dim intPos As Integer
    Dim locFolder As String
    Dim fileType As String
    On Error Resume Next

    locFolder = InputBox("Enter the folder path to DOCs", "File Conversion", "C:\Users\your_path_here\")
    Select Case Application.Version
        Case Is < 12
            Do
                fileType = UCase(InputBox("Change DOC to TXT, RTF, HTML", "File Conversion", "TXT"))
            Loop Until (fileType = "TXT" Or fileType = "RTF" Or fileType = "HTML")
        Case Is >= 12
            Do
                fileType = UCase(InputBox("Change DOC to TXT, RTF, HTML or PDF(2007+ only)", "File Conversion", "TXT"))
            Loop Until (fileType = "TXT" Or fileType = "RTF" Or fileType = "HTML" Or fileType = "PDF")
    End Select

    Application.ScreenUpdating = False
    Set fs = CreateObject("Scripting.FileSystemObject")
    Set oFolder = fs.GetFolder(locFolder)
    Set tFolder = fs.CreateFolder(locFolder & "Converted")
    Set tFolder = fs.GetFolder(locFolder & "Converted")

    For Each oFile In oFolder.Files
        Dim d As Document
        Set d = Application.Documents.Open(oFile.Path)
        strDocName = ActiveDocument.Name
        intPos = InStrRev(strDocName, ".")
        strDocName = Left(strDocName, intPos - 1)
        ChangeFileOpenDirectory tFolder
        Select Case fileType
            Case Is = "TXT"
                strDocName = strDocName & ".txt"
                ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatText
        Case Is = "RTF"
                strDocName = strDocName & ".rtf"
                ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatRTF
        Case Is = "HTML"
                strDocName = strDocName & ".html"
                ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatFilteredHTML
        Case Is = "PDF"
                strDocName = strDocName & ".pdf"
                ActiveDocument.ExportAsFixedFormat OutputFileName:=strDocName, ExportFormat:=wdExportFormatPDF
        End Select
        d.Close
        ChangeFileOpenDirectory oFolder
    Next oFile
    Application.ScreenUpdating = True

End Sub

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM