简体   繁体   English

用等价单词代替句子

[英]Replacing sentences with equivalent words

I would like to change sentences with equivalent words; 我想用相等的词来改变句子; For example: 例如:

Reference Database: 参考数据库:

put <-> set
returns <-> switch
between <-> among
paragraphs <-> null
between paragraphs <-> among paragraphs

Sentence: 句子:

put returns between paragraphs 将收益放在段落之间

Replaced Sentence: 替换句子:

set switch among paragraphs 在段落之间设置切换

Yes; 是; finding and replacing easy and fine. 轻松找到并替换。 But the question: I have " paragraphs " and " between paragraphs " in records. 但是问题是:我在记录中有“ 段落 ”和“ 段落之间 ”。 How can I search and replace long text before shorts? 如何在短裤前搜索和替换长文本? Performance is very important because might reference database has over 1 million rows. 性能非常重要,因为参考数据库可能有超过一百万行。

Currently using: Entity Framework, C#, MVC, SQL Server 2014 当前使用:实体框架,C#,MVC,SQL Server 2014

Any help will be fantastic. 任何帮助都会很棒。 Thanks. 谢谢。

Update: 更新:

Sentence: 句子:

string str = "The number of cases in the Ebola outbreak passes 10,000, with 4,922 deaths, the World Health Organization's latest report says."

And Database has 1,000,000 records; 数据库有1,000,000条记录; I can do it with get all rows from database, and foreach all to text. 我可以通过从数据库获取所有行,并将所有行都转换为文本来做到这一点。 This is stupid method. 这是愚蠢的方法。 I need a way to sentences to database How can I select rows with sentences? 我需要一种将句子存储到数据库中的方法如何选择包含句子的行?

I have a method like split all text with ( ) white-space and then try to select in database. 我有一种方法,例如使用()空格分割所有文本,然后尝试在数据库中进行选择。 But it'll pass long texts with white-space. 但这会传递带有空格的长文本。 "The number of cases"; “案件数”; it will search it like "the", "number", "of", "cases" with seperated search... And records will not useful. 它将以单独的搜索方式搜索“ the”,“ number”,“ of”,“ cases” ...并且记录将无用。

You could organise your replacement database in a trie-like structure . 您可以采用类似Trie的结构来组织替换数据库。 All single-word expressions are on the first level. 所有单词表达式都处于第一级。 Multi-word expressions are stored as descendants of preceding words in the expression. 多单词表达式存储为表达式中先前单词的后代。 In your example: 在您的示例中:

root
    -> put: set
    -> returns: switch
    -> between: among
        -> paragraphs: among paragraphs
    -> paragraphs: sections

The root will be a dictionary of words. 词根将是单词词典。 Each node has a sub-dictionary, which will be null in most cases, and a replacement value. 每个节点都有一个子词典(在大多数情况下为空)和一个替换值。 The replacement may be null in intermediate words, for example in ìn -> this -> case , this won't have a replacement, because in -> this isn't a valid replacement in itself. 替换可以是在中间空的话,例如,在ìn -> this -> casethis将不会有可更换的,因为in -> this本身不是一个有效的替代品。

Split your sentence and iterate though the words. 拆分句子并遍历单词。 If you find a possible start of a replacement, follow the trie and determine the longest possible replacement at this position. 如果您发现可能要进行更换,请遵循尝试并在该位置确定最长的更换时间。 Replace that and continue the iteration from the next word. 替换它并从下一个单词继续迭代。

 between you and me -> among you and me
 between other paragraphs -> among other sections
 between paragraphs -> among paragraphs

When you split the words, keep the space and punctuation between the words and follow the trie nodes only if there is no punctuation between the words, so that a sentence like 拆分单词时,请保留单词之间的空格和标点符号,并且仅当单词之间没有标点符号时才遵循trie节点,以便使类似

 There must be something in between; paragraphs 1 and 2 seem to indicate that.

is treated correctly. 被正确对待。

This method should be efficient if you have many sentences that need to be replaced with replacements from the same database. 如果您有很多句子需要用同一数据库中的替换词替换,则此方法应该有效。 The database trie must be built only once. 数据库特里必须只构建一次。 If you have only a few sentences to replace or if your database changes frequently, this is not a good approach. 如果您只需要替换几句话,或者数据库经常更改,那么这不是一个好方法。

you can use REPLACE function in sql server 您可以在SQL Server中使用REPLACE功能

SELECT REPLACE('abcdefghicde','cde','xxx') GO

this replace cde with xxx 这个用xxx替换cde

in your application the sql maybe 在您的应用程序中的SQL可能

UPDATE tablename set col=REPLACE(col,' put ',' set ') //Please notice there are 2 ' ' beside the word

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM