[英]Extracting words from text field in SQL
I'm currently building a little CMS for a smaller site. 我目前正在为一个较小的站点构建一些CMS。 Now I want to extract all words from the
text_content
field and store them in my word
table for later analysis. 现在,我想从
text_content
字段中提取所有单词,并将它们存储在我的word
表中,以供以后分析。
page( id int,
title varchar(45),
# ... a bunch of meta fields ...
html_content text,
text_content text);
word( page_id int, # Foreign key
word varchar(100)); # I presume there are no words longer than 100 chars
Currently I'm using the following code, which runs very slowly (understandably) for larger chunks of text. 目前,我正在使用以下代码,对于较大的文本块,运行速度非常慢(可以理解)。
// Sidenote: $_POST is sanitized above scope of this code.
$_POST['text_content'] = str_replace("\t", "",
htmlspecialchars_decode(strip_tags($_POST['html_content'])));
// text is in swedish, so we add support for swedish vowels
$words = str_word_count($_POST['text_content'], 1, "åäöÅÄÖ");
// Delete all previous records of words
$this->db->delete("word", array('page_id' => $_POST['id']));
// Add current ones
foreach($words as $word)
{
if (trim($word) == "")
continue;
$this->db->query("INSERT INTO word(page_id, word) VALUES(?, ?)",
array($_POST['id'], strtolower(trim($word))));
}
Now, I'm not happy with this solution. 现在,我对这种解决方案不满意。 I was thinking of creating a trigger in the database which would do pretty much the same thing as the php version.
我当时正在考虑在数据库中创建一个触发器,该触发器与php版本几乎相同。 Is it possible to create a trigger in MySQL which would perform said actions, if so - how?
可以在MySQL中创建一个执行上述操作的触发器吗? Or is there a better way?
或者,还有更好的方法? Am I taking a crazy approach to this?
我是否对此采取疯狂的方法?
You could make this PHP code significantly faster by building up a single insert query and executing it rather than a separate query for every word. 通过建立一个插入查询并执行它,而不是对每个单词进行单独的查询,可以大大提高此PHP代码的速度。 Otherwise, I don't think your code looks that bad.
否则,我认为您的代码看起来不会那么糟糕。
Triggers that perform large calculations will slow down your application. 执行大型计算的触发器将减慢您的应用程序的速度。
I think you are better of scheduling a task to run periodically and perform the extraction for you. 我认为您最好将任务安排为定期运行并为您执行提取。
您是否尝试过PHP的“ htmlentities”功能来去除这些标签?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.