簡體   English   中英

查找文檔中引用的人

[英]Find a person that is referenced in a document

可以說,我有:

  • with 13000 person entries , including first name, name, birthday, street, zip code, city 具有13000個人條目 ,包括first name, name, birthday, street, zip code, city

  • that includes the personal data of one specific person . 包含一個特定人個人數據的 由於它是由OCR處理的,因此可能包含spelling errors

在這里您可以閱讀文字:

  Harry Potter, born 25.03.1995, resident at Jahnstreet 43, London is a series of seven fantasy novels written by British author J. K. Rowling. The series chronicles the adventures of a young wizard, Harry Potter, the titular character, and his friends Ronald Weasley and Hermione Granger, all of whom are students at Hogwarts School of Witchcraft and Wizardry. The main story arc concerns Harry's quest to defeat the Dark wizard Lord Voldemort, who aims to become immortal, conquer the wizarding world, subjugate non-magical people, and destroy all those who stand in his way, especially Harry Potter. Since the release of the first novel, Harry Potter and the Philosopher's Stone, on 30 June 1997, the books have gained immense popularity, critical acclaim and commercial success worldwide.[2] The series has also had some share of criticism, including concern about the increasingly dark tone as the series progressed. As of May 2015, the books have sold more than 450 million copies worldwide, making the series the best-selling book series in history, and have been translated into 73 languages.[3][4] The last four books consecutively set records as the fastest-selling books in history, with the final installment selling roughly 11 million copies in the United States within the first 24 hours of its release. A series of many genres, including fantasy, coming of age and the British school story (with elements of mystery, thriller, adventureand romance), it has many cultural meanings and references.[5] According to Rowling, the main theme is death.[6] There are also many other themes in the series, such as prejudice and corruption.[7]


現在, 我想在文檔中引用的數據庫中找到“人”


對於如何做到這一點,我有不同的想法。 但是我不知道哪個帶來最好的結果? 您會選擇哪種方式? 推薦? 謝謝

  1. 我將文本拆分為一個數組,然后遍歷數據庫中的每個birthday ,並在命中時使用javascripts text.search('25.03.1995')搜索,例如,通過下一個字段。 text.searc('Harry') 如果有幾個熱門,我找到了正確的記錄。

    • : Easy to implement, No need for database commands, pure javascript :易於實現,無需數據庫命令,純javascript
    • : If OCR made an error and read for eg. :如果OCR出錯並讀取例如。 Harly代替Harry我無法確定。 如果日期格式不同,也會發生相同的情況
  2. 首先,我借助數據庫對文本進行索引。 接下來,我采用與第一個示例類似的方法。 並遍歷數據庫中的每一列,但現在使用Database CONTAINS

    • Faster, Better Results? 更快,更好的結果?
    • I need a good Full-Text-Search Database 我需要一個好的全文搜索數據庫
  3. 我拆分文本並使用sql搜索數據庫列中的每個世界LIKE

    • I don't have to index the file, Like better than Contains? 我不必索引文件,比“包含”更好?
    • Not as fast as an text index? 不如文本索引快?

謝謝您的協助

我認為由於OCR錯誤,您有時必須對多個可能的匹配項進行排序,而13000個條目並不需要很多內存。 因此,僅使用第一種方法並完全在JS中完成可能會更容易。 無論哪種方式,您都必須嘗試解析CSV。

這取決於我認為OCR有多糟糕。 如果不好,全文索引可能會有所幫助。

您也可以嘗試從npm中的natural模塊使用類似字符串距離的內容。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM