简体   繁体   中英

How Do You Normalize Text to be Searched in a MySQL Database?

I'm not completely sure if I'm using the right words to describe this, because I'm having trouble finding information about it online. What I'm trying to do is what I believe is called word normalization. I'm setting up a MySQL database, and I have some text data that I want to be able to do a fulltext search on. What I want to do is normalize the words in the text. What I think this means is basically cutting off the ends of the stored words (and searched words) so that related words will show up in searches (ie jump, jumping, jumps, jumped would all show up when any of these words are searched). What infrastructure is available to do this in a MySQL database?

If you're looking to stay 100% in MySQL you can use Full text search functionality. https://dev.mysql.com/doc/refman/5.6/en/fulltext-search.html

You don't need to preprocess the text field as MySQL's build in functionality will handle stop words and word fragments for matches and weights.

Creating a full text index:

CREATE FULLTEXT INDEX fulltextindex ON yourtable(searchfield);

Running a full text search

SELECT primary_key, searchfield from yourtable where match(searchfield) against ('+someword*' in Boolean mode);

I find for simple things a Boolean with their basic pattern matching works well.

There are several different matching modes to play with and ways to construct patterns to match, but a comprehensive review is outside the scope of a SO answer. You'll need to play around with this functionality a bit to get comfortable and find something that meet's your exact needs.

This being all said: MySQL is not the "BEST" suited for this, but depending on your needs, can often provide adequate functionality. For example, I probably wouldn't add another layer like Solr to my stack to create a simple type ahead based on text search unless the type ahead needed some crazy next level intellegence--I'd just use mysql full text search

I'm not really sure what you are asking but to make it so that jump, jumping, jumps and jumped all show up, you can do something like:

SELECT * FROM tableName WHERE columnName LIKE 'jump%'

Apologies if this is not what you mean.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM