简体   繁体   中英

Importing/Copying and Pasting Word Document to HTML

We need to import OR copy and paste word documents and convert them to HTML ready data.

Here's my thoughts:

  • collect the text with file_get_contents
  • apply the function nl2br

However, it does not account for bold and other text formatting.

Also, there are several microsoft characters that we shouldn't require.

What is a good strategy for word imports into beautiful HTML?

I wouldn't try to tackle all of this on your own. word2cleanhtml.com looks like it will suit your needs and may have an API offering soon.

However, it appears that you can use Word itself from the command line to convert your document for you. This will, of course, require that MS Word is installed on your PHP server.

shell_exec("C:/Program Files/Microsoft Office/Office12/WINWORD.EXE /msaveashtml C:/path/to/your.doc");

The above code uses the macro defined in this answer to a similar question. You will need to copy the the saveashtml macro from that answer and add it to Word.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM