简体繁体 English

C＃搜索PDF

[英]C# Searching PDFs

原文 2017-11-18 20:13:44 8 1 c#/ pdf/ search

I'm using iTextSharp to get the content out of a pdf. 我正在使用iTextSharp从pdf中获取内容。 I want to allow the user to search for PDFs, much like they do on any search engine. 我想允许用户搜索PDF，就像在任何搜索引擎上一样。 The search should return the most relevant results. 搜索应返回最相关的结果。 I have written a library that performs the TF-IDF algorithm on the documents to return relevant results. 我已经编写了一个在文档上执行TF-IDF算法以返回相关结果的库。 While this works, I feel like I may be reinventing the wheel. 在此过程中，我觉得自己可能正在重新发明轮子。

This user should be able to search well over 50,000 PDFs. 该用户应该能够搜索超过50,000个PDF。 So there's alot of them. 所以有很多。 I don't want to store the full content of the PDF in my database as I feel that would be SUPER expensive. 我不想将PDF的全部内容存储在我的数据库中，因为我认为这会非常昂贵。 To mitigate this, I've written my library so that it will accept a frequency distribution when calculating TF-IDF. 为了减轻这种情况，我已经编写了库，以便在计算TF-IDF时它将接受频率分布。 This allows me to read the PDF when it's added to the system instead of every time a search is performed. 这样，当我将PDF添加到系统中时，而不是每次执行搜索时，都可以阅读它。

Do libraries exist that already do this sort of thing? 是否存在已经在执行此类操作的库？

1 个解决方案

Lucene.NET will do what you need. Lucene.NET将满足您的需求。

And there are commercial ones like our 'SearchUnit' 还有一些商业广告，例如我们的“ SearchUnit”

在C＃winform中搜索pdf - Searching pdfs in a C# winform

结合PDF c＃ - Combine PDFs c#

使用 iframe 在 C# MVC 中动态显示 PDF - dynamically displaying PDFs in C# MVC with iframe

有什么方法可以在 C# 中“清理”PDF？ - Is there any way to "sanitize" PDFs in C#?

C＃：下载PDF并将其附加到MailMessage已损坏 - C#: downloading and attaching PDFs to MailMessage are corrupt

使用AJAX和C＃动态创建PDF - Creating PDFs on the fly using AJAX & C#

C＃GhostScript将多个PDF转换为PostScript - C# GhostScript convert multiple PDFs to PostScript

如何使用 C# 渲染 pdf - How to render pdfs using C#

C＃搜索数组列表 - c# searching arraylist

C＃-搜索字符串 - C# - Searching strings

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在C＃winform中搜索pdf - Searching pdfs in a C# winform 结合PDF c＃ - Combine PDFs c# 使用 iframe 在 C# MVC 中动态显示 PDF - dynamically displaying PDFs in C# MVC with iframe 有什么方法可以在 C# 中“清理”PDF？ - Is there any way to "sanitize" PDFs in C#? C＃：下载PDF并将其附加到MailMessage已损坏 - C#: downloading and attaching PDFs to MailMessage are corrupt 使用AJAX和C＃动态创建PDF - Creating PDFs on the fly using AJAX & C# C＃GhostScript将多个PDF转换为PostScript - C# GhostScript convert multiple PDFs to PostScript 如何使用 C# 渲染 pdf - How to render pdfs using C# C＃搜索数组列表 - c# searching arraylist C＃-搜索字符串 - C# - Searching strings

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM