简体   繁体   中英

I have a word doc. i want to get word count per page of word doc?

i could only find solution for per line but cant find page break; also confused a lot. for docx also cant find exact word count.

function read_doc($filename) {
$fileHandle = fopen($filename, "r");
$line = @fread($fileHandle, filesize($filename));
$lines = explode(chr(0x0D), $line); 
$outtext = "";
foreach ($lines as $key => $thisline) {
    if( $key > 11 ){
    var_dump($thisline);
    $pos = strpos($thisline, chr(0x00));
    if (($pos !== FALSE) || (strlen($thisline) == 0)) {
        continue;
    } else { 
        var_dump($thisline);
        $text = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t@\/\_\(\)]/", "", $thisline);
        var_dump($text);
    }
    }
}  
return $outtext;

}

Implementing your own code for this doesn't sound like a good idea. I would recommend using an external library such as PHPWord . It should allow you to convert the file to plain text. Then, you can extract the word count from it.

Also, an external library such as that adds support for a number of file formats, not restricting you to Word 97-2003.

Here's a basic piece of VB.NET code that counts words per page but be aware it depends on what Word considers to be a word, it is not necessarily what a user considers a word. In my experience you need to properly analyse how Word behaves, what it interprets and then build your logic to ensure that you get the results that you need. It's not PHP but it does the job and can be be a starting point for you.

Structure WordsPerPage
    Public pagenum As String
    Public count As Long
End Structure

Public Sub CountWordsPerPage(doc As Document)
    Dim index As Integer
    Dim pagenum As Integer
    Dim newItem As WordsPerPage
    Dim tmpList As New List(Of WordsPerPage)

    Try
        For Each wrd As Range In doc.Words
            pagenum = wrd.Information(WdInformation.wdActiveEndPageNumber)
            Debug.Print("Word {0} is on page {1}", wrd.Text, pagenum)
            index = tmpList.FindIndex(Function(value As WordsPerPage)
                                          Return value.pagenum = pagenum
                                      End Function)
            If index <> -1 Then
                tmpList(index) = New WordsPerPage With {.pagenum = pagenum, .count = tmpList(index).count + 1}
            Else
                ' Unique (or first)
                newItem.count = 1
                newItem.pagenum = pagenum
                tmpList.Add(newItem)
            End If

        Next

    Catch ex As Exception
        WorkerErrorLog.AddLog(ex, Err.Number & " " & Err.Description)
    Finally
        Dim totalWordCount As Long = 0
        For Each item In tmpList
            totalWordCount = totalWordCount + item.count
            Debug.Print("Page {0} has {1} words", item.pagenum, item.count)
        Next
        Debug.Print("Total word count is {0}", totalWordCount)
    End Try
End Sub

When you unzip .doc or .docx file, you will get folder. Look for document.xml file in word subfolder. You will get whole document with xml syntax. Split string by page xml syntax, Strip xml syntax and use str_word_count .

What is figure out that i will need a windows server :-- using COM object ;; Please check this link https://github.com/lettertoamit/MS-Word-PER-PAGE-WORDCOUNT/blob/master/index.php

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM