简体   繁体   中英

What data type should use in a MySQL database to store 2 text files of code. If I intend to compare similarity later

What data type should use in a MySQL database to store 2 text files of code. If I intend to compare similarity later.

It's a MySQL database running on my Windows machine.

Also can you recommend an API that can compare code for me.

As per MySQL documentation

Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 65,535. The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.

...

Values in CHAR and VARCHAR columns are sorted and compared according to the character set collation assigned to the column.

So, VARCHAR is stored inline with the table, whilst BLOB and TEXT types are stored off the table with the database holding the location of the data. Depending on how long your text is, TEXT might be defined as TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT, the only difference is the maximum amount of data it holds.

  • TINYTEXT 256 bytes
  • TEXT 65,535 bytes
  • MEDIUMTEXT 16,777,215 bytes
  • LONGTEXT 4,294,967,295 bytes

To compare the two strings stored in TEXT (or any other string column) you might want to use STRCMP(expr1,expr2)

STRCMP() returns 0 if the strings are the same, -1 if the first argument is smaller than the second according to the current sort order, and 1 otherwise.

If you specify the desired output of the comparison, I might edit the answer.

EDIT

To compare two strings and calculate the difference percentage, you might want to use similar_text . As the official documentation states :

This calculates the similarity between two strings as described in Programming Classics: Implementing the World's Best Algorithms by Oliver (ISBN 0-131-00413-1). Note that this implementation does not use a stack as in Oliver's pseudo code, but recursive calls which may or may not speed up the whole process. Note also that the complexity of this algorithm is O(N**3) where N is the length of the longest string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM