[英]extract countries data from sql using regex
I got SQL for inserting countries, I just want to save some columns instead of all columns in the SQL query.我得到了用于插入国家/地区的 SQL,我只想保存一些列而不是 SQL 查询中的所有列。
I have cleaned the data using regex but facing only one little problem.我已经使用正则表达式清理了数据,但只遇到了一个小问题。 The regex doesn't match if the values are starting in the same line, I am using regex101.com for replacing
如果值从同一行开始,则正则表达式不匹配,我使用 regex101.com 进行替换
My Regex我的正则表达式
(\(([0-9]),(['a-zA-Z ]*),(['a-zA-Z ]*)).*
Sample data样本数据
(1,'Afghanistan','AFG','004','AF','93','Kabul','AFN','Afghan afghani','؋','.af','افغانستان','Asia','Southern Asia','[{\"zoneName\":\"Asia/Kabul\",\"gmtOffset\":16200,\"gmtOffsetName\":\"UTC+04:30\",\"abbreviation\":\"AFT\",\"tzName\":\"Afghanistan Time\"}]','{\"kr\":\"아프가니스탄\",\"br\":\"Afeganistão\",\"pt\":\"Afeganistão\",\"nl\":\"Afghanistan\",\"hr\":\"Afganistan\",\"fa\":\"افغانستان\",\"de\":\"Afghanistan\",\"es\":\"Afganistán\",\"fr\":\"Afghanistan\",\"ja\":\"アフガニスタン\",\"it\":\"Afghanistan\",\"cn\":\"阿富汗\",\"tr\":\"Afganistan\"}',33.00000000,65.00000000,'🇦🇫','U+1F1E6 U+1F1EB','2018-07-21 07:11:03','2022-05-21 21:06:00',1,'Q889'),
(2,'Aland Islands','ALA','248','AX','+358-18','Mariehamn','EUR','Euro','€','.ax','Åland','Europe','Northern Europe','[{\"zoneName\":\"Europe/Mariehamn\",\"gmtOffset\":7200,\"gmtOffsetName\":\"UTC+02:00\",\"abbreviation\":\"EET\",\"tzName\":\"Eastern European Time\"}]','{\"kr\":\"올란드 제도\",\"br\":\"Ilhas de Aland\",\"pt\":\"Ilhas de Aland\",\"nl\":\"Ålandeilanden\",\"hr\":\"Ålandski otoci\",\"fa\":\"جزایر الند\",\"de\":\"Åland\",\"es\":\"Alandia\",\"fr\":\"Åland\",\"ja\":\"オーランド諸島\",\"it\":\"Isole Aland\",\"cn\":\"奥兰群岛\",\"tr\":\"Åland Adalari\"}',60.11666700,19.90000000,'🇦🇽','U+1F1E6 U+1F1FD','2018-07-21 07:11:03','2022-05-21 21:06:00',1,NULL),
(3,'Albania','ALB','008','AL','355','Tirana','ALL','Albanian lek','Lek','.al','Shqipëria','Europe','Southern Europe','[{\"zoneName\":\"Europe/Tirane\",\"gmtOffset\":3600,\"gmtOffsetName\":\"UTC+01:00\",\"abbreviation\":\"CET\",\"tzName\":\"Central European Time\"}]','{\"kr\":\"알바니아\",\"br\":\"Albânia\",\"pt\":\"Albânia\",\"nl\":\"Albanië\",\"hr\":\"Albanija\",\"fa\":\"آلبانی\",\"de\":\"Albanien\",\"es\":\"Albania\",\"fr\":\"Albanie\",\"ja\":\"アルバニア\",\"it\":\"Albania\",\"cn\":\"阿尔巴尼亚\",\"tr\":\"Arnavutluk\"}',41.00000000,20.00000000,'🇦🇱','U+1F1E6 U+1F1F1','2018-07-21 07:11:03','2022-05-21 21:06:00',1,'Q222'),(4,'Algeria','DZA','012','DZ','213','Algiers','DZD','Algerian dinar','دج','.dz','الجزائر','Africa','Northern Africa','[{\"zoneName\":\"Africa/Algiers\",\"gmtOffset\":3600,\"gmtOffsetName\":\"UTC+01:00\",\"abbreviation\":\"CET\",\"tzName\":\"Central European Time\"}]','{\"kr\":\"알제리\",\"br\":\"Argélia\",\"pt\":\"Argélia\",\"nl\":\"Algerije\",\"hr\":\"Alžir\",\"fa\":\"الجزایر\",\"de\":\"Algerien\",\"es\":\"Argelia\",\"fr\":\"Algérie\",\"ja\":\"アルジェリア\",\"it\":\"Algeria\",\"cn\":\"阿尔及利亚\",\"tr\":\"Cezayir\"}',28.00000000,3.00000000,'🇩🇿','U+1F1E9 U+1F1FF','2018-07-21 07:11:03','2022-05-21 21:06:00',1,'Q262'),(5,'American Samoa','ASM','016','AS','+1-684','Pago Pago','USD','US Dollar','$','.as','American Samoa','Oceania','Polynesia','[{\"zoneName\":\"Pacific/Pago_Pago\",\"gmtOffset\":-39600,\"gmtOffsetName\":\"UTC-11:00\",\"abbreviation\":\"SST\",\"tzName\":\"Samoa Standard Time\"}]','{\"kr\":\"아메리칸사모아\",\"br\":\"Samoa Americana\",\"pt\":\"Samoa Americana\",\"nl\":\"Amerikaans Samoa\",\"hr\":\"Američka Samoa\",\"fa\":\"ساموآی آمریکا\",\"de\":\"Amerikanisch-Samoa\",\"es\":\"Samoa Americana\",\"fr\":\"Samoa américaines\",\"ja\":\"アメリカ領サモア\",\"it\":\"Samoa Americane\",\"cn\":\"美属萨摩亚\",\"tr\":\"Amerikan Samoasi\"}',-14.33333333,-170.00000000,'🇦🇸','U+1F1E6 U+1F1F8','2018-07-21 07:11:03','2022-05-21 21:06:00',1,NULL),
Replacement Data替换数据
$1, 1, NOW(), NOW()),
Expected Output预期产出
('Afghanistan','AFG', 1, NOW(), NOW()),
('Aland Islands','ALA', 1, NOW(), NOW()),
('Albania','ALB', 1, NOW(), NOW()),
The pattern I notice in your regex attempt divides in matching two groups:我在您的正则表达式尝试中注意到的模式分为匹配两组:
),
for every row, except the last one that ends with );
),
用于每一行,除了以);
The translation of this description into regex pattern is the following:将此描述转换为正则表达式模式如下:
(\(\d+,)(.*?(?=,'\d))(.*?(?:\)([,;])))
Group 1 Regex Explanation \(\d+,)
:第 1 组正则表达式解释
\(\d+,)
:
\(
: an open parenthesis \(
: 一个左括号\d+
: any combination of digits \d+
: 任意数字组合,
: a comma ,
: 逗号Group 2 Regex Explanation (.*?(?=,'\d))
:第 2 组正则表达式解释
(.*?(?=,'\d))
:
.*?
: any combination of characters (lazy - the least possible) (?=,'\d)
: before a comma followed by a quote and a digit (excluded from the match) (?=,'\d)
:在逗号之前,后跟引号和数字(从匹配项中排除) Group 3 Regex Explanation (.*?(?:\)([,;])))
:第 3 组正则表达式解释
(.*?(?:\)([,;])))
:
.*?
: any combination of characters (lazy - the least possible) (?:\)([,;]))
: before a parenthesis followed by either a comma or colon (?:\)([,;]))
:在括号之前,后跟逗号或冒号Then it's enough to replace your regex match with ($2, 1, NOW(), NOW())$4\n
然后用
($2, 1, NOW(), NOW())$4\n
替换您的正则表达式匹配就足够了
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.