简体   繁体   English

如何在C#(.NET)中加载MS Word文档的文本?

[英]How to load text of MS Word document in C# (.NET)?

How do I load MS Word document (.doc and .docx) to memory (variable) without doing this?: 如何在不执行此操作的情况下将MS Word文档(.doc和.docx)加载到内存(变量)?

wordApp.Documents.Open wordApp.Documents.Open

I don't want to open MS Word, I just want that text inside. 我不想打开MS Word,我只想在里面写这个文字。

You gave me answer for DOCX, but what about DOC? 你给了我DOCX的答案,但DOC怎么样? I want free and high performance solution - not to open 12.000 instances of Word to process all of them. 我想要免费和高性能的解决方案 - 不要打开12.000个Word实例来处理所有这些。 :( Aspose is commercial product, and 900$ is a way too much for what I do. :( Aspose是商业产品,900美元是我做的太多的方式。

You can use wordconv.exe which is part of the Office Compatibility Pack to convert from doc to docx. 您可以使用wordconv.exe作为Office兼容包的一部分,从doc转换为docx。

http://www.microsoft.com/downloads/details.aspx?familyid=941b3470-3ae9-4aee-8f43-c6bb74cd1466&displaylang=en http://www.microsoft.com/downloads/details.aspx?familyid=941b3470-3ae9-4aee-8f43-c6bb74cd1466&displaylang=en

Just call the command like so: "C:\\Program Files\\Microsoft Office\\Office12\\wordconv.exe" -oice -nme InputFile OutputFile 只需像这样调用命令:“C:\\ Program Files \\ Microsoft Office \\ Office12 \\ wordconv.exe”-oice -nme InputFile OutputFile

I'm not sure if you need word installed for it to run but it does work. 我不确定你是否需要安装单词才能运行但它确实有效。 I use it locally as a windows shell command to convert old office files to 2007 format whenever I want. 我在本地使用它作为Windows shell命令,以便随时将旧的office文件转换为2007格式。

If you are dealing with docx you can do this with out doing any interop with Word .docx file actually a ZIP contains an XML file , you can read the XML Please refer the below links 如果你正在处理docx你可以做任何与Word .docx文件互操作实际上一个ZIP包含一个XML文件,你可以阅读XML请参考下面的链接

http://conceptdev.blogspot.com/2007/03/open-docx-using-c-to-extract-text-for.html http://conceptdev.blogspot.com/2007/03/open-docx-using-c-to-extract-text-for.html

Office (2007) Open XML File Formats Office(2007)Open XML File Formats

For docx formatted Word Documents I found this interesting article on The CodeProject 对于docx格式的Word文档,我在CodeProject上找到了这篇有趣的文章

Using DocxToText to Extract Text from DOCX Files 使用DocxToText从DOCX文件中提取文本

In the article the author discusses stripping out just the words themselves. 在文章中,作者讨论了剥离单词本身。

For your doc (non-docx) Word Documents other than using the Office APIs and (in the background) spawning an instance of Word you could try shelling out to one of the many different Doc2Docx converters on the market and then applying the above process for both. 对于您的doc(非docx)Word文档而不是使用Office API和(在后台)生成Word的实例,您可以尝试向市场上的许多不同Doc2Docx转换器中的一个转换,然后应用上述过程都。

I recently did some research on this topic. 我最近做了一些关于这个主题的研究。 It turns out that to be able to manipulate word files programatically without opening word itself you need some very expensive tools. 事实证明,为了能够以编程方式操作word文件而不打开字本身,你需要一些非常昂贵的工具。

There's an article over at code project on manipulating Word , you might find it useful. 代码项目上有一篇关于操作Word的文章,你可能会发现它很有用。 The author build a C# COM wrapper for dealing with calls to Word. 作者构建了一个C#COM包装器,用于处理对Word的调用。 It looks like it actually pops open the word application though. 看起来它实际上弹出了应用程序这个词。

This post over at the neowin forums looks promising too. 这篇关于neowin论坛的帖子看起来也很有希望。 It includes quite a few PInvoked calls for the purpose of text extraction. 它包含了很多用于文本提取的PInvoked调用。

Maybe if you could find a way to keep the window hidden it would be acceptable. 也许如果你能找到一种隐藏窗户的方法,那将是可以接受的。

Aspose has a component to read, modify and write Word documents. Aspose有一个组件来读取,修改和编写Word文档。 Here is the product link : Aspose.Words for .NET and Java 这是产品链接: Aspose.Words for .NET和Java

Aspose.Words enables .NET and Java applications to read, modify and write Word® documents without utilizing Microsoft Word®. Aspose.Words使.NET和Java应用程序能够在不使用MicrosoftWord®的情况下读取,修改和编写Word®文档。 Aspose.Words supports a wide array of features including document creation, content and formatting manipulation, powerful mail merge abilities, comprehensive support of DOC, OOXML, RTF, WordprocessingML, HTML, OpenDocument and PDF formats. Aspose.Words支持各种功能,包括文档创建,内容和格式操作,强大的邮件合并功能,DOC,OOXML,RTF,WordprocessingML,HTML,OpenDocument和PDF格式的全面支持。 Aspose.Words is truly the most affordable, fastest and feature rich Word component on the market. Aspose.Words是市场上最实惠,最快,功能最丰富的Word组件。

With docxtemplater , you can easily get the full text of a word (works with docx only). 使用docxtemplater ,您可以轻松获取单词的全文(仅适用于docx)。

Here's the code (Node.JS) 这是代码(Node.JS)

DocxTemplater=require('docxtemplater'); doc=new DocxTemplater().loadFromFile("input.docx"); result=doc.getFullText();

This is just three lines of code and doesn't depend on any word instance (all plain JS) 这只是三行代码,并不依赖于任何单词实例(所有普通的JS)

I don't mean to be an antagonist, but why? 我不是故意成为反对者,而是为什么?

I've extracted data from Word Documents on Linux servers using Word2X or AbiWord and depending on the number and the variety of docments there will always be errors with the extraction. 我使用Word2X或AbiWord从Linux服务器上的Word文档中提取数据,并且根据文档的数量和种类,提取总是会出错。 It's worse the more bullets, page breaks, document sections and other "special" features there are. 更糟糕的是更多的子弹,分页符,文档部分和其他“特殊”功能。

I understand there are options now to automate OpenOffice to process documents, but my advice is, if you can, just use Word to process Word documents. 我知道现在有一些选项可以自动化OpenOffice来处理文档,但我的建议是,如果可以,只需使用Word来处理Word文档。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将c#应用程序中的文本插入MS Word文档并将其另存为新文件 - how to insert text from c# application into MS word document and save it as new file 如何在c#中的ms word文档中设置文本方向RightToLeft? - How can I set text direction RightToLeft in ms word document in c#? 如何在c#中的ms word文档中设置多语言文本的字体? - How can I set font for multi lingual text in ms word document in c#? 如何使用C#反转ms-word文档中的所有单词? - How to reverse all words in a ms-word document with C#? 如何编写 Paragraph.Next(Object) 方法以在 MS Word 文档中以编程方式 (C#) 在文本之后生成 2 个“下一个段落”? - How do I write Paragraph.Next(Object) Method to produce 2 “Next Paragraphs” Programmatically (C#) in a MS Word Document after text? NET C#如何在MS Word中使用高级查找 - How to use advanced find in MS Word with .NET C# 使用C#将MS Word文档模型作为新文档打开 - Open MS Word document model as new document using C# 未安装MS Office的情况下从Asp.net c#打印Word文档 - Print Word document from Asp.net c# Without MS Office Installed 如何在没有office.word.interop C#的情况下将带有图表的MS Word文档转换为PDF - How to convert MS word document with chart in it to PDF without office.word.interop c# 如何使用C#在特定标题旁边的Word文档中获取MS Word表? - How to get MS Word table in a word document next to specific heading using C#?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM