简体   繁体   English

Notepad ++替换产生多行的两个字符串之间的文本并在其间保留字符串的某些部分

[英]Notepad++ replace text between two strings spawning multiple lines and retain some part of the string in between

I have extracted many hive tables using show create table command.我使用 show create table 命令提取了许多 hive 表。

The output is like this: output是这样的:

CREATE EXTERNAL TABLE MYSCHEMA.MyTABLE(
  `col1` string, 
  `col2` string)
PARTITIONED BY ( 
  `data_as_of_date` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
WITH SERDEPROPERTIES ( 
  'input.regex'='^(.*?)~}\\|(.*?)~}\\|(.*?)$') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  '/mnt/data/schema/layer/domain/MYTABLE'
TBLPROPERTIES (
  'DO_NOT_UPDATE_STATS'='true', 
  'STATS_GENERATED_VIA_STATS_TASK'='true', 
  'last_modified_by'='user', 
  'last_modified_time'='1603077305', 
  'numRows'='23483974', 
  'parquet.compression'='SNAPPY', 
  'transient_lastDdlTime'='1608243340');

I want to replace the text between...我想替换...之间的文字

ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
WITH SERDEPROPERTIES ( 
  'input.regex'='^(.*?)~}\\|(.*?)~}\\|(.*?)$') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  '/mnt/data/schema/layer/domain/MYTABLE'
TBLPROPERTIES (
  'DO_NOT_UPDATE_STATS'='true', 
  'STATS_GENERATED_VIA_STATS_TASK'='true', 
  'last_modified_by'='user', 
  'last_modified_time'='1603077305', 
  'numRows'='23483974', 
  'parquet.compression'='SNAPPY', 
  'transient_lastDdlTime'='1608243340');

...to... ...至...

STORED AS PARQUET 
LOCATION '/mnt/data/schema/layer/domain/MYTABLE'
TBLPROPERTIES('parquet.compression'='SNAPPY');

...using Notepad++. ...使用记事本++。

Here if you observe, the LOCATION parameter should remain same from the original and rest should be replaced as stated above.在这里,如果您观察到,LOCATION 参数应与原始参数保持一致,并且应如上所述更换 rest。 Basically, the replace is spawning across multiple lines and I am also retaining some part of the text.基本上,替换是跨多行产生的,我还保留了部分文本。 Someone please guide with the regex that I can use in Notepad++ (v7.8.2).有人请指导我可以在 Notepad++ (v7.8.2) 中使用的正则表达式。

The final result should look like this:最终结果应如下所示:

CREATE EXTERNAL TABLE MYSCHEMA.MyTABLE(
      `col1` string, 
      `col2` string)
    PARTITIONED BY ( 
      `data_as_of_date` string)
STORED AS PARQUET 
LOCATION '/mnt/data/schema/layer/domain/MYTABLE'
TBLPROPERTIES('parquet.compression'='SNAPPY');

There are many tables and each table has a different LOCATION parameter.有许多表,每个表都有不同的 LOCATION 参数。 Do not want the LOCATION to be replaced as mentioned above.不希望如上所述更换 LOCATION。

It is also fine if I can do this in 2 parts.如果我能分两部分做到这一点也很好。 First replacing everything above LOCATION and then replacing the TBLPROPERTIES (if it cannot be done in single regex).首先替换 LOCATION 上方的所有内容,然后替换 TBLPROPERTIES(如果不能在单个正则表达式中完成)。

  • Ctrl + H Ctrl + H
  • Find what: ROW FORMAT SERDE[\s\S]+?(LOCATION\s+.+\R)[\s\S]*?TBLPROPERTIES[^)]+?\);查找内容: ROW FORMAT SERDE[\s\S]+?(LOCATION\s+.+\R)[\s\S]*?TBLPROPERTIES[^)]+?\);
  • Replace with: STORED AS PARQUET \n$1TBLPROPERTIES\('parquet.compression'='SNAPPY'\);替换为: STORED AS PARQUET \n$1TBLPROPERTIES\('parquet.compression'='SNAPPY'\);
  • CHECK Match case检查火柴盒
  • CHECK Wrap around检查环绕
  • CHECK Regular expression CHECK正则表达式
  • UNCHECK . matches newline取消选中. matches newline . matches newline
  • Replace all全部替换

Explanation:解释:

ROW FORMAT SERDE        # literally
[\s\S]+?                # 1 or more any character, including newline, not greedy
(                       # group 1
LOCATION                # literally
\s+                     # 1 or more spaces
.+                      # 1 or more any character but newline
\R                      # any kind of linebreak
)                       # end group
[\s\S]*?                # 1 or more any character, including newline, not greedy
TBLPROPERTIES           # literally
[^)]+?                  # 1 or more any character that is not closing parenthesis
\);                     # closing parenthesis and semicolon

Replacement:替代品:

STORED AS PARQUET 
\n
$1
TBLPROPERTIES\('parquet.compression'='SNAPPY'\);

Screenshot (before):截图(之前):

在此处输入图像描述

Screenshot (after):截图(之后):

在此处输入图像描述

I was able to do the same using two separate find and replace regex.我可以使用两个单独的查找和替换正则表达式来做同样的事情。 Didn't knew it was going to be so simple with 2 times doing find and replace.不知道它会如此简单,只需 2 次查找和替换。

  1. Replace: ROW FORMAT SERDE.*?LOCATION with: STORED AS PARQUET\r\nLOCATION将: ROW FORMAT SERDE.*?LOCATION替换为: STORED AS PARQUET\r\nLOCATION

  2. Replace: TBLPROPERTIES.*?\) with: TBLPROPERTIES \(\r\n 'parquet.compression'='SNAPPY'\)将: TBLPROPERTIES.*?\)替换为: TBLPROPERTIES \(\r\n 'parquet.compression'='SNAPPY'\)

I was having tough time to do this in single regex.我在单个正则表达式中很难做到这一点。 Anyone?任何人?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM