简体   繁体   English

使用正则表达式从sql中提取国家数据

[英]extract countries data from sql using regex

I got SQL for inserting countries, I just want to save some columns instead of all columns in the SQL query.我得到了用于插入国家/地区的 SQL,我只想保存一些列而不是 SQL 查询中的所有列。

I have cleaned the data using regex but facing only one little problem.我已经使用正则表达式清理了数据,但只遇到了一个小问题。 The regex doesn't match if the values are starting in the same line, I am using regex101.com for replacing如果值从同一行开始,则正则表达式不匹配,我使用 regex101.com 进行替换

Try it here .在这里试试。


My Regex我的正则表达式

(\(([0-9]),(['a-zA-Z ]*),(['a-zA-Z ]*)).*

Sample data样本数据

(1,'Afghanistan','AFG','004','AF','93','Kabul','AFN','Afghan afghani','؋','.af','افغانستان','Asia','Southern Asia','[{\"zoneName\":\"Asia/Kabul\",\"gmtOffset\":16200,\"gmtOffsetName\":\"UTC+04:30\",\"abbreviation\":\"AFT\",\"tzName\":\"Afghanistan Time\"}]','{\"kr\":\"아프가니스탄\",\"br\":\"Afeganistão\",\"pt\":\"Afeganistão\",\"nl\":\"Afghanistan\",\"hr\":\"Afganistan\",\"fa\":\"افغانستان\",\"de\":\"Afghanistan\",\"es\":\"Afganistán\",\"fr\":\"Afghanistan\",\"ja\":\"アフガニスタン\",\"it\":\"Afghanistan\",\"cn\":\"阿富汗\",\"tr\":\"Afganistan\"}',33.00000000,65.00000000,'🇦🇫','U+1F1E6 U+1F1EB','2018-07-21 07:11:03','2022-05-21 21:06:00',1,'Q889'),
(2,'Aland Islands','ALA','248','AX','+358-18','Mariehamn','EUR','Euro','€','.ax','Åland','Europe','Northern Europe','[{\"zoneName\":\"Europe/Mariehamn\",\"gmtOffset\":7200,\"gmtOffsetName\":\"UTC+02:00\",\"abbreviation\":\"EET\",\"tzName\":\"Eastern European Time\"}]','{\"kr\":\"올란드 제도\",\"br\":\"Ilhas de Aland\",\"pt\":\"Ilhas de Aland\",\"nl\":\"Ålandeilanden\",\"hr\":\"Ålandski otoci\",\"fa\":\"جزایر الند\",\"de\":\"Åland\",\"es\":\"Alandia\",\"fr\":\"Åland\",\"ja\":\"オーランド諸島\",\"it\":\"Isole Aland\",\"cn\":\"奥兰群岛\",\"tr\":\"Åland Adalari\"}',60.11666700,19.90000000,'🇦🇽','U+1F1E6 U+1F1FD','2018-07-21 07:11:03','2022-05-21 21:06:00',1,NULL),
(3,'Albania','ALB','008','AL','355','Tirana','ALL','Albanian lek','Lek','.al','Shqipëria','Europe','Southern Europe','[{\"zoneName\":\"Europe/Tirane\",\"gmtOffset\":3600,\"gmtOffsetName\":\"UTC+01:00\",\"abbreviation\":\"CET\",\"tzName\":\"Central European Time\"}]','{\"kr\":\"알바니아\",\"br\":\"Albânia\",\"pt\":\"Albânia\",\"nl\":\"Albanië\",\"hr\":\"Albanija\",\"fa\":\"آلبانی\",\"de\":\"Albanien\",\"es\":\"Albania\",\"fr\":\"Albanie\",\"ja\":\"アルバニア\",\"it\":\"Albania\",\"cn\":\"阿尔巴尼亚\",\"tr\":\"Arnavutluk\"}',41.00000000,20.00000000,'🇦🇱','U+1F1E6 U+1F1F1','2018-07-21 07:11:03','2022-05-21 21:06:00',1,'Q222'),(4,'Algeria','DZA','012','DZ','213','Algiers','DZD','Algerian dinar','دج','.dz','الجزائر','Africa','Northern Africa','[{\"zoneName\":\"Africa/Algiers\",\"gmtOffset\":3600,\"gmtOffsetName\":\"UTC+01:00\",\"abbreviation\":\"CET\",\"tzName\":\"Central European Time\"}]','{\"kr\":\"알제리\",\"br\":\"Argélia\",\"pt\":\"Argélia\",\"nl\":\"Algerije\",\"hr\":\"Alžir\",\"fa\":\"الجزایر\",\"de\":\"Algerien\",\"es\":\"Argelia\",\"fr\":\"Algérie\",\"ja\":\"アルジェリア\",\"it\":\"Algeria\",\"cn\":\"阿尔及利亚\",\"tr\":\"Cezayir\"}',28.00000000,3.00000000,'🇩🇿','U+1F1E9 U+1F1FF','2018-07-21 07:11:03','2022-05-21 21:06:00',1,'Q262'),(5,'American Samoa','ASM','016','AS','+1-684','Pago Pago','USD','US Dollar','$','.as','American Samoa','Oceania','Polynesia','[{\"zoneName\":\"Pacific/Pago_Pago\",\"gmtOffset\":-39600,\"gmtOffsetName\":\"UTC-11:00\",\"abbreviation\":\"SST\",\"tzName\":\"Samoa Standard Time\"}]','{\"kr\":\"아메리칸사모아\",\"br\":\"Samoa Americana\",\"pt\":\"Samoa Americana\",\"nl\":\"Amerikaans Samoa\",\"hr\":\"Američka Samoa\",\"fa\":\"ساموآی آمریکا\",\"de\":\"Amerikanisch-Samoa\",\"es\":\"Samoa Americana\",\"fr\":\"Samoa américaines\",\"ja\":\"アメリカ領サモア\",\"it\":\"Samoa Americane\",\"cn\":\"美属萨摩亚\",\"tr\":\"Amerikan Samoasi\"}',-14.33333333,-170.00000000,'🇦🇸','U+1F1E6 U+1F1F8','2018-07-21 07:11:03','2022-05-21 21:06:00',1,NULL),

Replacement Data替换数据

$1, 1, NOW(), NOW()),

Expected Output预期产出

('Afghanistan','AFG', 1, NOW(), NOW()),
('Aland Islands','ALA', 1, NOW(), NOW()),
('Albania','ALB', 1, NOW(), NOW()),

The pattern I notice in your regex attempt divides in matching two groups:我在您的正则表达式尝试中注意到的模式分为匹配两组:

  • Group 1 (replaced), which contains a parenthesis, the first digits and a comma第 1 组(已替换),其中包含一个括号、第一个数字和一个逗号
  • Group 2 (retained), which contains any character before a comma, a quote and another group of digits,第 2 组(保留),包含逗号前的任何字符、引号和另一组数字,
  • Group 3 (replaced), which contains the rest of the string, ending in ), for every row, except the last one that ends with );第 3 组(已替换),其中包含字符串的其余部分,以),用于每一行,除了以);

The translation of this description into regex pattern is the following:将此描述转换为正则表达式模式如下:

(\(\d+,)(.*?(?=,'\d))(.*?(?:\)([,;])))

Group 1 Regex Explanation \(\d+,) :第 1 组正则表达式解释\(\d+,)

  • \( : an open parenthesis \( : 一个左括号
  • \d+ : any combination of digits \d+ : 任意数字组合
  • , : a comma , : 逗号

Group 2 Regex Explanation (.*?(?=,'\d)) :第 2 组正则表达式解释(.*?(?=,'\d))

  • .*? : any combination of characters (lazy - the least possible) : 任意字符组合(懒惰 - 尽可能少)
  • (?=,'\d) : before a comma followed by a quote and a digit (excluded from the match) (?=,'\d) :在逗号之前,后跟引号和数字(从匹配项中排除)

Group 3 Regex Explanation (.*?(?:\)([,;]))) :第 3 组正则表达式解释(.*?(?:\)([,;])))

  • .*? : any combination of characters (lazy - the least possible) : 任意字符组合(懒惰 - 尽可能少)
  • (?:\)([,;])) : before a parenthesis followed by either a comma or colon (?:\)([,;])) :在括号之前,后跟逗号或冒号

Then it's enough to replace your regex match with ($2, 1, NOW(), NOW())$4\n然后用($2, 1, NOW(), NOW())$4\n替换您的正则表达式匹配就足够了

Check the demo here .此处查看演示。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM