简体   繁体   English

我有一个单词doc。 我想得到每页word word的字数?

[英]I have a word doc. i want to get word count per page of word doc?

i could only find solution for per line but cant find page break; 我只能找到每行的解决方案,但找不到分页符; also confused a lot. 也很困惑。 for docx also cant find exact word count. 对于docx也无法找到确切的字数。

function read_doc($filename) {
$fileHandle = fopen($filename, "r");
$line = @fread($fileHandle, filesize($filename));
$lines = explode(chr(0x0D), $line); 
$outtext = "";
foreach ($lines as $key => $thisline) {
    if( $key > 11 ){
    var_dump($thisline);
    $pos = strpos($thisline, chr(0x00));
    if (($pos !== FALSE) || (strlen($thisline) == 0)) {
        continue;
    } else { 
        var_dump($thisline);
        $text = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t@\/\_\(\)]/", "", $thisline);
        var_dump($text);
    }
    }
}  
return $outtext;

} }

Implementing your own code for this doesn't sound like a good idea. 为此实现自己的代码听起来不是一个好主意。 I would recommend using an external library such as PHPWord . 我建议使用PHPWord等外部库。 It should allow you to convert the file to plain text. 它应该允许您将文件转换为纯文本。 Then, you can extract the word count from it. 然后,您可以从中提取单词计数。

Also, an external library such as that adds support for a number of file formats, not restricting you to Word 97-2003. 此外,诸如此类的外部库增加了对许多文件格式的支持,而不是限制您使用Word 97-2003。

Here's a basic piece of VB.NET code that counts words per page but be aware it depends on what Word considers to be a word, it is not necessarily what a user considers a word. 这是一个VB.NET代码的基本部分,它计算每页的单词数,但要注意它取决于Word认为是单词的内容,它不一定是用户认为单词的内容。 In my experience you need to properly analyse how Word behaves, what it interprets and then build your logic to ensure that you get the results that you need. 根据我的经验,您需要正确分析Word的行为方式,解释的内容,然后构建逻辑以确保获得所需的结果。 It's not PHP but it does the job and can be be a starting point for you. 它不是PHP,但它可以完成工作并且可以成为您的起点。

Structure WordsPerPage
    Public pagenum As String
    Public count As Long
End Structure

Public Sub CountWordsPerPage(doc As Document)
    Dim index As Integer
    Dim pagenum As Integer
    Dim newItem As WordsPerPage
    Dim tmpList As New List(Of WordsPerPage)

    Try
        For Each wrd As Range In doc.Words
            pagenum = wrd.Information(WdInformation.wdActiveEndPageNumber)
            Debug.Print("Word {0} is on page {1}", wrd.Text, pagenum)
            index = tmpList.FindIndex(Function(value As WordsPerPage)
                                          Return value.pagenum = pagenum
                                      End Function)
            If index <> -1 Then
                tmpList(index) = New WordsPerPage With {.pagenum = pagenum, .count = tmpList(index).count + 1}
            Else
                ' Unique (or first)
                newItem.count = 1
                newItem.pagenum = pagenum
                tmpList.Add(newItem)
            End If

        Next

    Catch ex As Exception
        WorkerErrorLog.AddLog(ex, Err.Number & " " & Err.Description)
    Finally
        Dim totalWordCount As Long = 0
        For Each item In tmpList
            totalWordCount = totalWordCount + item.count
            Debug.Print("Page {0} has {1} words", item.pagenum, item.count)
        Next
        Debug.Print("Total word count is {0}", totalWordCount)
    End Try
End Sub

When you unzip .doc or .docx file, you will get folder. 解压缩.doc或.docx文件时,您将获得文件夹。 Look for document.xml file in word subfolder. 在word子文件夹中查找document.xml文件。 You will get whole document with xml syntax. 您将获得包含xml语法的完整文档。 Split string by page xml syntax, Strip xml syntax and use str_word_count . 按页xml语法拆分字符串, 删除 xml语法并使用str_word_count

What is figure out that i will need a windows server :-- using COM object ;; 什么是我需要一个Windows服务器: - 使用COM对象;; Please check this link https://github.com/lettertoamit/MS-Word-PER-PAGE-WORDCOUNT/blob/master/index.php 请检查此链接https://github.com/lettertoamit/MS-Word-PER-PAGE-WORDCOUNT/blob/master/index.php

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM