简体   繁体   English

添加到Whoosh索引的奇怪错误

[英]Strange error adding to Whoosh index

Can anyone help me with this strange error I'm getting when adding a new document to a Whoosh index? 将新文档添加到Whoosh索引时,有人会遇到这个奇怪的错误吗?

Here's the code: 这是代码:

def add_to_index(self, doc):
    ix = index.open_dir(self.index_dir)
    writer = AsyncWriter(ix) # use async writer to prevent write lock errors
    writer.add_document(**self.get_doc_args(doc))
    writer.commit()

def get_doc_args(self, doc):
    return {
        'id':        u""+str(doc['id']),
        'org':       doc['org__id'],
        'created':   doc['created_date'],
        'date':      doc['received_date'],
        'from_addr': doc['from_addr'],
        'subject':   doc['subject'],
        'body':      doc['messagebody__cleaned_message']
    }

I get the following error: 我收到以下错误:

TypeError('ord() expected a character, but string of length 0 found',)
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/dist-packages/celery/execute/trace.py", line 36, in trace
    return cls(states.SUCCESS, retval=fun(*args, **kwargs))
  File "/usr/local/lib/python2.6/dist-packages/celery/app/task/__init__.py", line 232, in __call__
    return self.run(*args, **kwargs)
  File "/usr/local/lib/python2.6/dist-packages/celery/app/__init__.py", line 172, in run
    return fun(*args, **kwargs)
  File "/mnt/deploy/prod/chorus/src/chorus/../chorus/search/__init__.py", line 131, in index_message
    MessageSearcher().add_to_index(message)
  File "/mnt/deploy/prod/chorus/src/chorus/../chorus/search/__init__.py", line 29, in add_to_index
    writer.commit()
  File "/usr/local/lib/python2.6/dist-packages/whoosh/writing.py", line 423, in commit
    self.writer.commit(*args, **kwargs)
  File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filewriting.py", line 501, in commit
    new_segments = mergetype(self, self.segments)
  File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filewriting.py", line 78, in MERGE_SMALL
    reader = SegmentReader(writer.storage, writer.schema, seg)
  File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filereading.py", line 63, in __init__
    self.termsindex = TermIndexReader(tf)
  File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filetables.py", line 590, in __init__
    super(TermIndexReader, self).__init__(dbfile)
  File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filetables.py", line 502, in __init__
    OrderedHashReader.__init__(self, dbfile)
  File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filetables.py", line 379, in __init__
    HashReader.__init__(self, dbfile)
  File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filetables.py", line 187, in __init__
    self.hashtype = dbfile.read_byte()
  File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/structfile.py", line 219, in read_byte
    return ord(self.file.read(1))

Strangely, the exact same code using a standard writer (ie not AsyncWriter) works just fine. 奇怪的是,使用标准编写器(即不是AsyncWriter)使用完全相同的代码也可以。 What am I missing here? 我在这里想念什么? Note that in production I have to use AsyncWriter in order to avoid LockErrors. 请注意,在生产中我必须使用AsyncWriter以避免LockErrors。

This error is caused by some kind of index corruption. 此错误是由某种索引损坏引起的。 In my case the machine crashed by another reason while index was being rebuild. 在我的情况下,重建索引时机器由于另一个原因而崩溃。

You can easily solve it by deleting whoosh_index folder contents completely and rebuilidng index. 您可以通过完全删除whoosh_index文件夹内容并重新建立索引来轻松解决该问题。

Ended up finding a solution; 最终找到了解决方案; it's called Solr :-) 它叫做Solr :-)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM