简体   繁体   English

有没有办法获取word文档的页数?

[英]Is there a way to get the page count of a word doc?

Preferably I would like to do this in the browser with javascript.最好我想在浏览器中使用 javascript 执行此操作。 I am already able to unzip the doc file and read the xml files but can't seem to find a way to get a page count.我已经能够解压缩 doc 文件并读取 xml 文件,但似乎无法找到获取页数的方法。 I am hoping the property exist in the xml files I just need to find it.我希望该属性存在于我只需要找到它的 xml 文件中。

edit: I wouldn't say it is a duplicate of Is there a way to count doc, docx, pdf pages with only js (without Node.js)?编辑:我不会说它是重复的有没有办法计算只有 js(没有 Node.js)的 doc、docx、pdf 页面? My question is specific to word doc/docx files and that question was never resolved.我的问题是针对 word doc/docx 文件的,这个问题从未得到解决。

In theory, the following property can return that information from the Word Open XML file, using the Open XML SDK:理论上,以下属性可以使用 Open XML SDK 从 Word Open XML 文件返回该信息:

int pageCount = (int) document.ExtendedFilePropertiesPart.Properties.Pages.Text;

In practice, however, this isn't reliable.然而,在实践中,这并不可靠。 It might work, but then again, it might not - it all depends on 1) What Word managed to save in the file before it was closed and 2) what kind of editing may have been done on the closed file.它可能会起作用,但又可能不会——这完全取决于 1) Word 在关闭文件之前设法保存在文件中的内容,以及 2) 可能对关闭的文件进行了什么样的编辑。

The only sure way to get a page number or a page count is to open a document in the Word application interface.获取页码或页数的唯一可靠方法是在 Word 应用程序界面中打开文档。 Page count and number of pages is calculated dynamically, during editing, by Word.页数和页数在编辑期间由 Word 动态计算。 When a document is closed, this information is static and not necessarily what it will be when the document is open or printed.当文档关闭时,此信息是静态的,不一定是打开或打印文档时的信息。

See also https://github.com/OfficeDev/Open-XML-SDK/issues/22 for confirmation.另请参阅https://github.com/OfficeDev/Open-XML-SDK/issues/22进行确认。

Found a way to do this with docx4js找到了一种使用docx4js做到这一点的方法

Here is a small sample parsing file from input elem这是来自输入 elem 的一个小样本解析文件

import docx4js from 'docx4js';

docx4js.load(file).then(doc => {
  const propsAppRaw = doc.parts['docProps/app.xml']._data.getContent();
  const propsApp = new TextDecoder('utf-8').decode(propsAppRaw);
  const match = propsApp.match(/<Pages>(\d+)<\/Pages>/);
  if (match && match[1]) {
    const count = Number(match[1]);
    console.log(count);
  }
});

When you say "do this in the browser" I assume that you have a running webserver with LAMP or the equivalent.当您说“在浏览器中执行此操作”时,我假设您有一个正在运行的带有 LAMP 或同等功能的网络服务器。 In PHP, there is a pretty useful option for .docx files.在 PHP 中,.docx 文件有一个非常有用的选项。 An example php function would be:一个示例 php 函数是:

function number_pages_docx($filename)
{
$docx = new docxArchive();

if($docx->open($filename) === true)
{  
    if(($index = $docx->locateName('docProps/app.xml')) !== false)
    {
        $data = $docx->getFromIndex($index);
        $docx->close();

        $xml = new SimpleXMLElement($data);
        return $xml->Pages;
    }

    $docx->close();
}

return false;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM