简体   繁体   English

SQL substring 的字段包含在另一个 substring 中,用于 JOIN 语句

[英]SQL substring of field contained within another substring for JOIN statement

I am using two datasets from the NYC MTA turnstile data and subway station location one contains the turnstile data collected at a particular subway station while the other contains the longitude and latitude of said subway station.我正在使用来自 NYC MTA 旋转门数据地铁站位置的两个数据集,一个包含在特定地铁站收集的旋转门数据,而另一个包含所述地铁站的经度和纬度。 There is no common key between the tables.表之间没有公共键。 I had hoped to use subway station name however there are many different stations containing the same name within a table and in addition the naming conventions are slightly different between the tables.我曾希望使用地铁站名称,但是在一张表中有许多不同的车站包含相同的名称,此外,表之间的命名约定略有不同。 To overcome this I would like to use a combination of the subway station name and the lines present at the station to join the tables based on substrings.为了克服这个问题,我想使用地铁站名称和车站中存在的线路的组合来基于子字符串加入表格。

For example:例如:

In the train station locations table one row contains在火车站位置表中,一行包含

+------------------------+-----------------+
|Name                    |Line             |
+------------------------+-----------------+
|Lexington Ave - 59th St | 4-5-6-6 Express |
+------------------------+-----------------+

While in the train station data table one row may look like this而在火车站数据表中的一行可能看起来像这样

+---------+-----------------+
| Station | LineName        |
+---------+-----------------+
| 59 ST   | NQR456W         |
+---------+-----------------+

The best workaround I could think of is to do some kind of search using the LIKE keyword OR LOCATE function to return back singular rows that contain the same substrings of characters for the station and line ie LIKE("%59%") AND NQR456 .我能想到的最佳解决方法是使用LIKE关键字或LOCATE function 进行某种搜索,以返回包含站点和行的相同字符子字符串的单数行,即LIKE("%59%") AND NQR456 I'm hoping to ignore substrings like ST and AVE and characters like '''-'''.我希望忽略 ST 和 AVE 之类的子字符串以及 '''-''' 之类的字符。

Once I have these rows I would like to make a new column with a proper key of a shared unique id for each station that I can make a JOIN statement on.一旦我有了这些行,我想为每个可以在其上进行JOIN语句的站创建一个具有共享唯一 ID 的正确键的新列。

Thank you in advance for all of your help预先感谢您的所有帮助

Ive tried the query below however it is not working as intended due to only searching for one substring within another我尝试了下面的查询,但是由于仅在另一个 substring 中搜索一个 substring,它没有按预期工作

SELECT tsl.station, td.station, td.linename, tsl.line
FROM train_station_locations tsl, turnstile_data td
WHERE CONCAT('%',LOWER(tsl.station),'%')
 LIKE CONCAT('%', REPLACE(REPLACE(td.station," st","")," ",""),'%') 
 AND  CONCAT('%',LOWER(td.linename),'%') LIKE 
 REPLACE(CONCAT('%',LOWER(tsl.line),'%'),"-","");

I've referred to the following questions我参考了以下问题

https://stackoverflow.com/a/40140482/9367155 https://stackoverflow.com/a/40140482/9367155

SQL: Join tables on substrings SQL:在子字符串上连接表

It must be frustrating to deal with data that doesn't have a PK...处理没有PK的数据一定很沮丧……

Based on the data you are sharing above, it seems it would work to strip out the non-numeric characters of both fields and look for a match.根据您在上面共享的数据,似乎可以去除两个字段的非数字字符并寻找匹配项。 59 = 59. 59 = 59。

MySQL 8 supports REGEXP_REPLACE: https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-replace MySQL 8 支持 REGEXP_REPLACE: https://dev.mysql.com/doc/refman/8.0/en/regexp.html#replacefunction

Prior to MySQL 8, you can create a custom function: MySQL strip non-numeric characters to compare在 MySQL 8 之前,您可以创建自定义 function: MySQL 条带非数字字符进行比较

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM