简体   繁体   中英

How to compare string data to table data in SQL Server - I need to know if a value in a string doesn't exist in a column

I have two tables, one an import table, the other a FK constraint on the table the import table will eventually be put into. In the import table a user can provide a list of semicolon separated values that correspond to values in the 2nd table.

So we're looking at something like this:

TABLE 1
ID | Column1
1  | A; B; C; D

TABLE 2
ID  | Column2
1   | A
2   | B
3   | D
4   | E

The requirement is:

Rows in TABLE 1 with a value not in TABLE 2 (C in our example) should be marked as invalid for manual cleanup by the user. Rows where all values are valid are handled by another script that already works.

In production we'll be dealing with 6 columns that need to be checked and imports of AT LEAST 100k rows at a time. As a result I'd like to do all the work in the DB, not in another app.

BTW, it's SQL2008.

I'm stuck, anyone have any ideas. Thanks!

Seems to me you could pass ID & Column1 values from Table1 to a Table-Valued function (or a temp table in-line) which would parse the ;-delimited list, returning individual values per record.

Here are a couple options:

The result ( ID, value ) from the function could be used to compare (unmatched query) against values in Table 2.

SELECT tmp.ID
FROM tmp
LEFT JOIN Table2 ON Table2.id = tmp.ID
WHERE Table2.id is null

The ID results of the comparison would then be used to flag records in Table 1.

Here is an easy and straightforward solution for the IDs of the invalid rows, despite its lack of performance because of string manipulations.

select T1.ID
from [TABLE 1] T1
    left join [TABLE 2] T2
        on ('; ' + T1.COLUMN1 + '; ') like ('%; ' + T2.COLUMN2 + '; %')
where T1.COLUMN1 is not null
group by T1.ID
having count(*) < len(T1.COLUMN1) - len(replace(T1.COLUMN1, ';', '')) + 1

There are two assumptions:

  1. The semicolon-separated list does not contain duplicates
  2. TABLE 2 does not contain duplicates in COLUMN2.

The second assumption can easily be fixed by using ( select distinct COLUMN2 from [TABLE 2] ) rather than [TABLE 2] .

Perhaps inserting those composite values into 'TABLE 1' may have seemed like the most convenient solution at one time. However, unless your users are using SQL Server Management Studio or something similar to enter the values directly into the table then I assume there must be a software layer between the UI and the database. If so, you're going to save yourself a lot headaches both now and in the long run by investing a little time in altering your code to split the semi-colon delimited inputs into discrete values before inserting them into the database. This will result in 'TABLE 1' looking something like this

TABLE 1
ID  | Column1
1   | A
1   | B
1   | C
1   | D

It's then trivial to write the SQL to find those IDs which are invalid.

If it is possible, try putting the values in separate rows when importing (instead of storing it as ; separated).

This might help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM