I'm working on a Solr dataimport
from an Oracle database. The database system has a set of tables dedicated to storing references to changes in other tables. For example, I might have a table named PERSON
, and when records are added to this table, their IDs are added to the PERSON_CHANGED
table. I'd like to use this PERSON_CHANGED
table when defining my deltaQuery
so that Solr only indexes the changed records in subsequent indexes. As part of this process, I need to remove records that I've read from the PERSON_CHANGED
table after Solr finishes its import (either delta or full), so that I don't process them again later.
What's the best way to run this kind of "cleanup" SQL query after a dataimport
?
I've tried combining both of the queries like this (simplified for brevity):
<dataConfig>
<dataSource ... >
<document>
<entity name="person"
query="
SELECT ID, FIRST_NAME, LAST_NAME
FROM PERSON
WHERE '${dataimporter.request.clean}' != 'false'
OR PERSON_ID IN (
SELECT ID FROM CHANGED_PERSON
);
DELETE * (
SELECT * FROM CHANGED_PERSON
);
" />
</document>
</dataConfig>
But this results in a SQL command not properly ended
error. Does Solr provide a way to do this kind of cleanup?
Once you're using delta import in SOLR, solr won't process twice your record, since you will keep track of this records every time you will run
Ref doc:
When delta-import command is executed, it reads the start time stored in conf/dataimport.properties.
link: https://wiki.apache.org/solr/DataImportHandler#Delta-Import_Example
As part of your question, I can imagine that you're trying to perform full import every time that you run the deltaimport (full import runs cleanup in solr indexes ... etc). This is not the proper way to do deltaimport.
What I would recommand you is : 1) perform delta import (and not full import) 2) once every X days, X month, if your need to, perform a clean import Better to do it in another core, so that your service continues running and you will only replace the cores.
I found a way to accomplish this cleanup task, but I'm not super happy with it. I can define a separate entity whose query runs a DELETE
:
<dataConfig>
<dataSource ... >
<document>
<entity name="person"
query="
SELECT ID, FIRST_NAME, LAST_NAME
FROM PERSON
WHERE '${dataimporter.request.clean}' != 'false'
OR PERSON_ID IN (
SELECT ID FROM CHANGED_PERSON
)" />
<entity name="deleteChangedPersonRecords"
query="DELETE FROM CHANGED_PERSON" />
</document>
</dataConfig>
This seems to work, but it's a bit of a hack, and it relies on the assumption that Solr executes its entity queries in the same order that they are specified in the file. If anyone has a better solution, please feel free to add your answer to this question.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.