VB6 / SQL “Long Text” not comparing correctly (MS Access 2013)

Question

I have been working a long time on a project using MS Access 2013. One issue i am having is i have very very long "Comments" in tables i need to break down and insert into new tables. Each comment is linked to a "RouteID" and their relationships between the two can be many to many. The main issue i am having is there are duplicate comments in the table i am moving data FROM. There is no need to keep the duplicate "Comments", the only difference in the rows is the "RouteID". Basically i have an OLD comments table and a NEW comments table.

My issue is its not correctly checking if comments from my OLD table are in the NEW table and is creating duplicates.

SOME COMMENTS ARE found to be duplicates, others are not and the size of the comments that are found to NOT BE DUPLICATES vary on size and symbols from short to very long.

Here is some code i have written, i have attempted multiple versions of SQL and VBA/VB6 code, however the result is still the same, duplicate comments are showing up in my new table. Please feel free to critique this regardless if it has to do with my issue or not.

I am aware that some queries can be far far too long to work so i have made a SQL query to compare the TABLE'S together however that also fails and duplicate comments remain. I have checked my code and i do not believe that i am doing the logic incorrectly

Please help! No one seems to know what to do in my circle of friends / professors. I have an idea to take the comments and HASH them and put them into a similar table and use that to check

If Not (rsOLD.EOF And rsOLD.BOF) Then
    rsOLD.MoveFirst
    Do Until (rsOLD.EOF = True)
        TComment = rsOLD(CommentColumn)
        TResponse = rsOLD(ResponseColumn)
        If Not IsNull(TComment) Then
            TComment = Replace(TComment, "'", "''")
            SQL = "SELECT Comment, ID FROM Comments WHERE Comment = (SELECT '" & CommentColumn & _
                  "' FROM CommentsOld WHERE (CommentsOld.ID = " & rsOLD!ID & "));"
            'SQL = "SELECT Comment FROM Comments" & _
            '      " INNER JOIN CommentsOld" & _
            '      " ON Comments.Comment = CommentsOld." & CommentColumn & _
            '      " WHERE CommentsOld.ID = " & rsOLD!ID & ";"
            Set rsCHECK = CurrentDb.OpenRecordset(SQL, dbOpenDynaset)
            If (rsCHECK.EOF And rsCHECK.BOF) Then 'IF COMMENT DOES NOT EXIST, NOTHING FOUND

I have attempted to work with a bool function that loops through a recordset, but the BigO of the loops is far to large to complete in a reasonable amount of time given the size of the records in each table.

Answer 1

One possible cause would be that your code is doing

SQL = "SELECT Comment, ID FROM Comments WHERE Comment = (SELECT '" & CommentColumn & _
        "' FROM CommentsOld WHERE (CommentsOld.ID = " & rsOLD!ID & "));"

so instead of returning the contents of the column whose name is in the variable CommentColumn you are returning the column name as a literal string. That is, if CommentColumn contains "Column1" then your SQL code is not doing

... (Select Column1 FROM CommentsOld ...

it is doing

... (Select 'Column1' FROM CommentsOld ...

Perhaps you should try

SQL = "SELECT Comment, ID FROM Comments WHERE Comment = (SELECT [" & CommentColumn & _
        "] FROM CommentsOld WHERE (CommentsOld.ID = " & rsOLD!ID & "));"

Edit re: comment

Since there are some significant restrictions on Memo (Long Text) fields compared with Text (Short Text) fields wrt joins, DISTINCT queries (as discussed in another answer), etc. your hashing idea is starting to look increasingly appealing. There are links to some VBA/VB6 implementations of various hashing algorithms in an answer here .

Generating a hash for every comment could potentially be rather time-consuming so you'll probably only want to do it once. If you could add a [..._hash] column for each comment column (eg, add a Short Text column named [CP1_hash] for the Long Text column [CP1]) and store the hashes in there, that would be ideal. Once the hashing was done you could compare comment hashes instead of the comments themselves. In addition, the hash columns could be joined, fully indexed, and manipulated in other useful ways.

(Yes, there would be the remote chance of a hash collision, but I think it would be extremely unlikely given the length of the strings you are likely to be processing.)

One thing you would certainly not want to do is use the hashing function itself in a WHERE clause or a JOIN condition. That would cause a table scan and force the re-calculation of all the hash values for every row, and that could really slow things down.

Answer 2

Start with an empty table, NewTable. Then run this query:

insert into NewTable (Comment)
SELECT distinct Comment
FROM OldTable;

the 'distinct' will exclude all duplicates, so what ends up in NewTable should be unique. You can then go through OldTable and do what you like with each of the RouteIDs.

VB6 / SQL “Long Text” not comparing correctly (MS Access 2013)

Question

2 answers

solution1
1 ACCPTED 2014-01-23 18:34:37

solution2
0 2014-01-23 17:33:15

VB6 / SQL “Long Text” not comparing correctly (MS Access 2013)

Question

2 answers

solution1 1 ACCPTED 2014-01-23 18:34:37

solution2 0 2014-01-23 17:33:15

solution1
1 ACCPTED 2014-01-23 18:34:37

solution2
0 2014-01-23 17:33:15