[英]Fastest way to record all DocIds and FileNames from dtSearch in SQL database
我正在使用dtSearch與SQL數據庫組合,並希望維護一個包含所有DocIds及其相關FileNames的表。 從那里,我將添加一個帶有我的外鍵的列,以允許我組合文本和數據庫搜索。
我有代碼簡單地返回索引中的所有記錄並將它們逐個添加到數據庫中。 然而,這需要永遠,並沒有解決在添加到索引時如何簡單地追加新記錄的問題。 但為了防止它有所幫助:
MyDatabaseContext db = new StateScapeEntities();
IndexJob ij = new dtSearch.Engine.IndexJob();
ij.IndexPath = @"d:\myindex";
IndexInfo indexInfo = dtSearch.Engine.IndexJob.GetIndexInfo(@"d:\myindex");
bool jobDone = ij.Execute();
SearchResults sr = new SearchResults();
uint n = indexInfo.DocCount;
for (int i = 1; i <= n; i++)
{
sr.AddDoc(ij.IndexPath, i, null);
}
for (int i = 1; i <= n; i++)
{
sr.GetNthDoc(i - 1);
//IndexDocument is defined elsewhere
IndexDocument id = new IndexDocument();
id.DocId = sr.CurrentItem.DocId;
id.FilePath = sr.CurrentItem.Filename;
if (id.FilePath != null)
{
db.IndexDocuments.Add(id);
db.SaveChanges();
}
}
要將DocId保留在索引中,必須在IndexJob中使用標志dtsIndexKeepExistingDocIds
當DocID更改時,您還可以查看dtSearch文本檢索引擎程序員參考
將文檔添加到索引時,會為其分配DocId,並且DocIds始終按順序編號。
重新編制索引文檔時,將取消舊的DocId並分配新的DocId。
壓縮索引時,除非在IndexJob中設置了dtsIndexKeepExistingDocIds標志,否則索引中的所有DocId都將重新編號以刪除已取消的DocId。
當索引合並到另一個索引時,目標索引中的DocIds永遠不會更改。 合並到目標索引的文檔將全部分配新的,按順序編號的DocId,除非(a)在IndexJob中設置了dtsIndexKeepExistingDocIds標志,並且(b)索引具有非重疊的doc id范圍。
要提高速度,您可以搜索單詞“ xfirstword ”並獲取索引中的所有文檔。
您還可以查看常見問題解答如何檢索索引中的所有文檔
因此,我使用了user2172986的部分響應,但將其與一些額外的代碼結合起來以獲得我的問題的解決方案。 我確實必須在索引更新例程中設置dtsKeepExistingDocIds標志。 從那里,我只想將新創建的DocId添加到我的SQL數據庫中。 為此,我使用了以下代碼:
string indexPath = @"d:\myindex";
using (IndexJob ij = new dtSearch.Engine.IndexJob())
{
//make sure the updated index doesn't change DocIds
ij.IndexingFlags = IndexingFlags.dtsIndexKeepExistingDocIds;
ij.IndexPath = indexPath;
ij.ActionAdd = true;
ij.FoldersToIndex.Add( indexPath + "<+>");
ij.IncludeFilters.Add( "*");
bool jobDone = ij.Execute();
}
//create a DataTable to hold results
DataTable newIndexDoc = MakeTempIndexDocTable(); //this is a custom method not included in this example; just creates a DataTable with the appropriate columns
//connect to the DB;
MyDataBase db = new MyDataBase(); //again, custom code not included - link to EntityFramework entity
//get the last DocId in the DB?
int lastDbDocId = db.IndexDocuments.OrderByDescending(i => i.DocId).FirstOrDefault().DocId;
//get the last DocId in the Index
IndexInfo indexInfo = dtSearch.Engine.IndexJob.GetIndexInfo(indexPath);
uint latestIndexDocId = indexInfo.LastDocId;
//create a searchFilter
dtSearch.Engine.SearchFilter sf = new SearchFilter();
int indexId = sf.AddIndex(indexPath);
//only select new records (from one greater than the last DocId in the DB to the last DocId in the index itself
sf.SelectItems(indexId, lastDbDocId + 1, int.Parse(latestIndexDocId.ToString()), true);
using (SearchJob sj = new dtSearch.Engine.SearchJob())
{
sj.SetFilter(sf);
//return every document in the specified range (using xfirstword)
sj.Request = "xfirstword";
// Specify the path to the index to search here
sj.IndexesToSearch.Add(indexPath);
//additional flags and limits redacted for clarity
sj.Execute();
// Store the error message in the status
//redacted for clarity
SearchResults results = sj.Results;
int startIdx = 0;
int endIdx = results.Count;
if (startIdx==endIdx)
return;
for (int i = startIdx; i < endIdx; i++)
{
results.GetNthDoc(i);
IndexDocument id = new IndexDocument();
id.DocId = results.CurrentItem.DocId;
id.FileName= results.CurrentItem.Filename;
if (id.FileName!= null)
{
DataRow row = newIndexDoc.NewRow();
row["DocId"] = id.DocId;
row["FileName"] = id.FileName;
newIndexDoc.Rows.Add(row);
}
}
newIndexDoc.AcceptChanges();
//SqlBulkCopy
using (SqlConnection connection =
new SqlConnection(db.Database.Connection.ConnectionString))
{
connection.Open();
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
{
bulkCopy.DestinationTableName =
"dbo.IndexDocument";
try
{
// Write from the source to the destination.
bulkCopy.WriteToServer(newIndexDoc);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
}
newIndexDoc.Clear();
db.UpdateIndexDocument();
}
以下是我在SearchResults界面中使用AddDoc方法的新解決方案:
首先從IndexInfo獲取StartingDocID和LastDocID並像這樣循環:
function GetFilename(paDocID: Integer): String;
var
lCOMSearchResults: ISearchResults;
lSearchResults_Count: Integer;
begin
if Assigned(prCOMServer) then
begin
lCOMSearchResults := prCOMServer.NewSearchResults as ISearchResults;
lCOMSearchResults.AddDoc(GetIndexPath(prIndexContent), paDocID, 0);
lSearchResults_Count := lCOMSearchResults.Count;
if lSearchResults_Count = 1 then
begin
lCOMSearchResults.GetNthDoc(0);
Result := lCOMSearchResults.DocDetailItem['_Filename'];
end;
end;
end
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.