简体   繁体   中英

Parse a word document into an excel file

I have a word document that has data that I would like to parse into an excel file. The source files are hundreds of pages long. I have been working with VBA, but I just started learning the language and have run into lots of difficulties with trying to input a .doc file. I have been able to use the Open and the Line Input statement to retrieve from a .txt file but only gibberish when I try the .doc file.

I have included two links of screen shots.

The first is a screenshot of a sample of my input data.
http://img717.imageshack.us/i/input.jpg/

The second is a screenshot of my desired output.
http://img3.imageshack.us/i/outputg.jpg/

I have developed an algorithm of what I want to accomplish. I am just having difficulties coding. Below is the pseudocode that I have developed.

    Variables:
         string     line = blank
         series_title = blank
         folder_title = blank

         int  series_number = 0
              box_number = 0
              folder_number = 0
              year = 0
    do while the <end_of_document> has not been reached
        input line
        If the first word in the line is “series” 
            store <series_number>
            store the string after “:”into the <series_title>
        end if
        call parse_box(rest of line)
        output < series_number > <series_title> < box_number > < folder_number ><folder_title> <year>
    end do while

    function parse_box(current line)
        If the first word in the line is “box” 
            store <box_number>
        end if
        call parse_folder(rest of line)
    end function

    function parse_folder(current line)
        If first word is “Folder”
            store <folder_number>
        end if
        call parse_folder_title(rest of line)
    end function

    function parse_folder_title_and_year(current line)
        string temp_folder_title
        store everything as <temp_folder_title> until end of line
        if last word in <temp_folder_title> is a year
            store <year>
        end if
        if < temp_folder_title> is empty/blank
            //use <folder_title> from before
        else
            <folder_title> is < temp_folder_title> minus <year>
        end if
    end parse_folder_title_and_year

Thanks ahead of time for all your help and suggestions

fopen and input commands generally only work on plain text files (things you can read in Notepad). If you want to programatically read from Microsoft word documents, you'll have to add the Microsoft Word 12.0 Object Library (or most recent version on your system) to your VBAProject references, and use the Word API to open and read the document.

Dim odoc As Word.Document
Set odoc = oWrd.Documents.Open(Filename:=DocumentPath, Visible:=False)

Dim singleLine As Paragraph
Dim lineText As String

For Each singleLine In ActiveDocument.Paragraphs
    lineText = singleLine.Range.Text
    'Do what you've gotta do
Next singleLine

Word doesn't have a concept of "Lines". You can read text ranges, and paragraphs, and sentences. Experiment and find what works best for getting your input text in manageable blocks.

Here is code that actually works.

'Create a New Object for Microsoft Word Application
Dim objWord As New Word.Application
'Create a New Word Document Object
Dim objDoc As New Word.Document
'Open a Word Document and Set it to the newly created object above
Set objDoc = objWord.Documents.Open(Filename:=DocFilename, Visible:=False)

Dim strSingleLine As Paragraph
Dim strLineText As String

For Each strSingleLine In objDoc.Paragraphs
    strLineText = strSingleLine.Range.Text
    'Do what you've gotta do
Next strSingleLine

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM