简体   繁体   中英

Java Lucene integration with .Net

I've got nutch and lucene setup to crawl and index some sites and I'd like to use a .net website instead of the JSP site that comes with nutch.

Can anyone recommend some solutions?

I've seen solutions where there was an app running on the index server which the .Net site used remoting to connect to.

Speed is a consideration obviously so can this still perform well?

Edit: could NHibernate.Search work for this?

Edit: We ended up going with Solr index servers being used by our ASP.net site with the solrnet library.

In case it wasn't totally clear from the other answers, Lucene.NET and Lucene (Java) use the same index format, so you should be able continue to use your existing (Java-based) mechanisms for indexing , and then use Lucene.NET inside your .NET web application to query the index.

From the Lucene.NET incubator site :

In addition to the APIs and classes port to C#, the algorithm of Java Lucene is ported to C# Lucene. This means an index created with Java Lucene is back-and-forth compatible with the C# Lucene; both at reading, writing and updating. In fact a Lucene index can be concurrently searched and updated using Java Lucene and C# Lucene processes

您可以使用Solr来索引nutch(参见此处 ),而不是使用Lucene,然后您可以使用两个可用库之一( SolrSharpSolrNet)轻松连接到Solr。

Got here by searching for a comparison between SolrNet and SolrSharp, just thought I'd leave here my impressions.

It seems like SolarSharp is a dead project (wasn't updated for a long time) so the only option is SolarNet.

I hope this will help someone, I would have left a comment to the accepted answer but I don't have enough reputation yet :)

I'm also working on this.

http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html

It seems you can submit your query to nutch and get the rss results back.

edit:

Got this working today in a windows form as a proof of concept. Two textboxes(searchurl and query), one for the server url and one for the query. One datagrid view.

private void Form1_Load(object sender, EventArgs e)
        {
            searchurl.Text = "http://localhost:8080/opensearch?query=";


    }

    private void search_Click(object sender, EventArgs e)
    {
        string uri;

        uri = searchurl.Text.ToString() + query.Text.ToString();
        Console.WriteLine(uri);

        XmlDocument myXMLDocument = new XmlDocument();

        myXMLDocument.Load(uri);

        DataSet ds = new DataSet();

        ds.ReadXml(new XmlNodeReader(myXMLDocument));

        SearchResultsGridView1.DataSource = ds;
        SearchResultsGridView1.DataMember = "item";

    }

Instead of using Solr, I wrote a java based indexer that runs in a cron job, and a java based web service for querying. I actually didn't index pages so much as different types of data that the .net site uses to build the pages. So there's actually 4 different indexes each with a different document structure that can all be queried in about the same way (say: users, posts, messages, photos).

By defining an XSD for the web service responses I was able to both generate classes in .net and java to store a representation of the documents. The web service basically runs the query on the right index and fills out the response xml from the hits. The .net client parses that back into objects. There's also a json interface for any client side JavaScript.

Why not switch from java lucene to the dot net version. Sure it's an investment but it's mostly a class substitution exercise. The last thing you need is more layers that add no value other than just being glue. Less glue and more stuff is what you should aim for...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM