简体   繁体   English

将word文档解析为excel文件

[英]Parse a word document into an excel file

I have a word document that has data that I would like to parse into an excel file.我有一个 word 文档,其中包含我想解析为 excel 文件的数据。 The source files are hundreds of pages long.源文件长达数百页。 I have been working with VBA, but I just started learning the language and have run into lots of difficulties with trying to input a .doc file.我一直在使用 VBA,但我刚刚开始学习这门语言,并且在尝试输入 .doc 文件时遇到了很多困难。 I have been able to use the Open and the Line Input statement to retrieve from a .txt file but only gibberish when I try the .doc file.我已经能够使用OpenLine Input语句从 .txt 文件中检索,但在我尝试 .doc 文件时只会出现乱码。

I have included two links of screen shots.我已经包含了两个屏幕截图链接。

The first is a screenshot of a sample of my input data.第一个是我的输入数据样本的屏幕截图。
http://img717.imageshack.us/i/input.jpg/ http://img717.imageshack.us/i/input.jpg/

The second is a screenshot of my desired output.第二个是我想要的输出的屏幕截图。
http://img3.imageshack.us/i/outputg.jpg/ http://img3.imageshack.us/i/outputg.jpg/

I have developed an algorithm of what I want to accomplish.我已经开发了一个我想要完成的算法。 I am just having difficulties coding.我只是在编码时遇到困难。 Below is the pseudocode that I have developed.下面是我开发的伪代码。

    Variables:
         string     line = blank
         series_title = blank
         folder_title = blank

         int  series_number = 0
              box_number = 0
              folder_number = 0
              year = 0
    do while the <end_of_document> has not been reached
        input line
        If the first word in the line is “series” 
            store <series_number>
            store the string after “:”into the <series_title>
        end if
        call parse_box(rest of line)
        output < series_number > <series_title> < box_number > < folder_number ><folder_title> <year>
    end do while

    function parse_box(current line)
        If the first word in the line is “box” 
            store <box_number>
        end if
        call parse_folder(rest of line)
    end function

    function parse_folder(current line)
        If first word is “Folder”
            store <folder_number>
        end if
        call parse_folder_title(rest of line)
    end function

    function parse_folder_title_and_year(current line)
        string temp_folder_title
        store everything as <temp_folder_title> until end of line
        if last word in <temp_folder_title> is a year
            store <year>
        end if
        if < temp_folder_title> is empty/blank
            //use <folder_title> from before
        else
            <folder_title> is < temp_folder_title> minus <year>
        end if
    end parse_folder_title_and_year

Thanks ahead of time for all your help and suggestions提前感谢您的所有帮助和建议

fopen and input commands generally only work on plain text files (things you can read in Notepad). fopen 和 input 命令通常仅适用于纯文本文件(您可以在记事本中阅读的内容)。 If you want to programatically read from Microsoft word documents, you'll have to add the Microsoft Word 12.0 Object Library (or most recent version on your system) to your VBAProject references, and use the Word API to open and read the document.如果要以编程方式读取 Microsoft Word 文档,则必须将 Microsoft Word 12.0 对象库(或系统上的最新版本)添加到 VBAProject 引用中,并使用 Word API 打开和阅读文档。

Dim odoc As Word.Document
Set odoc = oWrd.Documents.Open(Filename:=DocumentPath, Visible:=False)

Dim singleLine As Paragraph
Dim lineText As String

For Each singleLine In ActiveDocument.Paragraphs
    lineText = singleLine.Range.Text
    'Do what you've gotta do
Next singleLine

Word doesn't have a concept of "Lines". Word 没有“行”的概念。 You can read text ranges, and paragraphs, and sentences.您可以阅读文本范围、段落和句子。 Experiment and find what works best for getting your input text in manageable blocks.试验并找出最适合在可管理块中获取输入文本的方法。

Here is code that actually works.这是实际工作的代码。

'Create a New Object for Microsoft Word Application
Dim objWord As New Word.Application
'Create a New Word Document Object
Dim objDoc As New Word.Document
'Open a Word Document and Set it to the newly created object above
Set objDoc = objWord.Documents.Open(Filename:=DocFilename, Visible:=False)

Dim strSingleLine As Paragraph
Dim strLineText As String

For Each strSingleLine In objDoc.Paragraphs
    strLineText = strSingleLine.Range.Text
    'Do what you've gotta do
Next strSingleLine

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM