[英]Strange error adding to Whoosh index
Can anyone help me with this strange error I'm getting when adding a new document to a Whoosh index? 将新文档添加到Whoosh索引时,有人会遇到这个奇怪的错误吗?
Here's the code: 这是代码:
def add_to_index(self, doc):
ix = index.open_dir(self.index_dir)
writer = AsyncWriter(ix) # use async writer to prevent write lock errors
writer.add_document(**self.get_doc_args(doc))
writer.commit()
def get_doc_args(self, doc):
return {
'id': u""+str(doc['id']),
'org': doc['org__id'],
'created': doc['created_date'],
'date': doc['received_date'],
'from_addr': doc['from_addr'],
'subject': doc['subject'],
'body': doc['messagebody__cleaned_message']
}
I get the following error: 我收到以下错误:
TypeError('ord() expected a character, but string of length 0 found',)
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/celery/execute/trace.py", line 36, in trace
return cls(states.SUCCESS, retval=fun(*args, **kwargs))
File "/usr/local/lib/python2.6/dist-packages/celery/app/task/__init__.py", line 232, in __call__
return self.run(*args, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/celery/app/__init__.py", line 172, in run
return fun(*args, **kwargs)
File "/mnt/deploy/prod/chorus/src/chorus/../chorus/search/__init__.py", line 131, in index_message
MessageSearcher().add_to_index(message)
File "/mnt/deploy/prod/chorus/src/chorus/../chorus/search/__init__.py", line 29, in add_to_index
writer.commit()
File "/usr/local/lib/python2.6/dist-packages/whoosh/writing.py", line 423, in commit
self.writer.commit(*args, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filewriting.py", line 501, in commit
new_segments = mergetype(self, self.segments)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filewriting.py", line 78, in MERGE_SMALL
reader = SegmentReader(writer.storage, writer.schema, seg)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filereading.py", line 63, in __init__
self.termsindex = TermIndexReader(tf)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filetables.py", line 590, in __init__
super(TermIndexReader, self).__init__(dbfile)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filetables.py", line 502, in __init__
OrderedHashReader.__init__(self, dbfile)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filetables.py", line 379, in __init__
HashReader.__init__(self, dbfile)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filetables.py", line 187, in __init__
self.hashtype = dbfile.read_byte()
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/structfile.py", line 219, in read_byte
return ord(self.file.read(1))
Strangely, the exact same code using a standard writer (ie not AsyncWriter) works just fine. 奇怪的是,使用标准编写器(即不是AsyncWriter)使用完全相同的代码也可以。 What am I missing here?
我在这里想念什么? Note that in production I have to use AsyncWriter in order to avoid LockErrors.
请注意,在生产中我必须使用AsyncWriter以避免LockErrors。
This error is caused by some kind of index corruption. 此错误是由某种索引损坏引起的。 In my case the machine crashed by another reason while index was being rebuild.
在我的情况下,重建索引时机器由于另一个原因而崩溃。
You can easily solve it by deleting whoosh_index folder contents completely and rebuilidng index. 您可以通过完全删除whoosh_index文件夹内容并重新建立索引来轻松解决该问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.