简体   繁体   中英

Solr DIH XML do not extracting

I'm trying to index some wiki pages using Solr 7.0, but in the last step for that, the DataImportHandler apparently isn't extracting the data. I don't know what is happening, because no error is being thrown.

When I call http://localhost:8983/solr/mycore/dataimport?command=full-import two different behaviors are noticeable.

The first response for my first request is.

{
    "responseHeader":{
        "status":0,
        "QTime":75
    },
    "initArgs":[
        "defaults",[
            "config","data-config.xml"
         ]
     ],
    "command":"full-import",
    "status":"idle",
    "importResponse":"",
    "statusMessages":{}
 }

The second response when I just press enter again is.

{
    "responseHeader":{
        "status":0,
        "QTime":26
    },
    "initArgs":[
        "defaults",[
            "config","data-config.xml"
        ]
    ],
    "command":"full-import",
    "status":"idle",
    "importResponse":"",
    "statusMessages":{
        "Total Requests made to DataSource":"0",
        "Total Rows Fetched":"2",
        "Total Documents Processed":"0",
        "Total Documents Skipped":"0",
        "Full Dump Started":"2017-10-28 07:05:31",
        "":"Indexing completed. Added/Updated: 0 documents. Deleted 0 
            documents.",
        "Committed":"2017-10-28 07:05:31",
        "Time taken":"0:0:0.449"
    }
}

As you can see in the second answer, the DIH found 2 documents. It's exactly the number of the document that I have in my test file wiki.xml . The problem is DIH isn't extracting as you may notice in Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.

Here is my Solr configuration: git gist . I'm using Windows 10, Solr 7.0 and Lucene 7.0.

What I've tried so far...

  • One those data that I'm trying to extract is the "user", but there are some irregularities with it, for example, the <contributor> XML tag have some time two subtag <username> (the user nickname) and <id> (the user id) when a user has an account and some other times when the user doesn't have an account the <contributor> appears only with one subtag <ip> . So I just try to import the data without the "user" data.
  • I'm just trying to get only the id and title. To that, I comment the other fields in data-config.xml .

No one those tests work.

Your problem is very simple, your entity tag is closed initially, so all following fields tags are ignored.

So, you need to add </entity> tag afterwards and replace <entity/> with just <entity>

However, your solrconfig.xml still contains a mistake, you're using ClassicIndexSchemaFactory , but you have AddSchemaFieldsUpdateProcessorFactory , which will cause an exception. You should rather replace classic schema factory with managed one, or just remove this add fields update processor factory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM