简体   繁体   English

上传后无法读取.docx文件(nodejs)

[英]can't read .docx file after uploading (nodejs)

So I'm trying to upload and read a .docx file to an express server using express-fileupload package. 因此,我正在尝试使用express-fileupload软件包将.docx文件上传并读取到Express服务器。 The upload part is working fine but somehow I'm not able to read the file as it prints unreadable gibberish text. 上传部分工作正常,但由于某种原因我无法读取该文件,因为它会打印不可读的乱码文本。 Following is the code: 以下是代码:

app.post('/upload', (req, res, next) => {
  let file = req.files.file;

  file.mv(`${__dirname}/public/${req.body.filename}`, function(err) {
    if (err) {
      return res.status(500).send(err);
    }

    fs.readFile(`${__dirname}/public/${req.body.filename}`, 'utf8', function (err,data) {
      if (err) {
        return console.log(err);
      }
      console.log(data) // prints broken text/gibberish;
    });

    res.json({data to be returned});
  });

});

What I want is to be able to read the .docx file and do operations on the text inside it. 我想要的是能够读取.docx文件并对其中的文本进行操作。

docx file don't contain human-readable text. docx文件不包含人类可读的文本。 They are actually ZIP files containing many different XML files - but even the text content of the XML files won't be easy to work with. 它们实际上是包含许多不同XML文件的ZIP文件-但是,即使XML文件的文本内容也很难使用。

If you want to read or even modify text inside a docx file you need to find a library that can read/write the format. 如果要读取甚至修改docx文件中的文本,则需要找到一个可以读取/写入格式的库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM