简体   繁体   English

正则表达式提取字符串

[英]Regular expression to extract a string

I need to extract a string from a big string. 我需要从一个大字符串中提取一个字符串。 Is it possible to use regular expression to extract the string: 是否可以使用正则表达式提取字符串:

4567       Test Assembling the Plant(4566)   [2]         WAST         Testing1<CR><LF>
ERTW         Test the second assembly           [3]        JEST         Test4<CR><LF>
V345           This is another test (FAR X) [9]      KERT         Test192<CR><LF>
--         This is test Number 10       [6] <CR><lf>
                                                              Test100<CR><LF>
           Number of the testing assembly  (1234)                     Test the plant assembly <CR><LF>  

V234              Testing the WIRE ASSEMBLY                               Test this assembly (12345-7876544)  [9]  <CR><LF>
C34567        This is another test assembly   (123456) [6]        trew43     This is test assembly<CR><LF>
RT234      Testing the assembly1100                            PQR         Testing assembly<CR><LF>
PL234         Test                                               RET<CR><LF>

Can I use regular expression to extract the above data and insert them in database 我可以使用正则表达式提取上述数据并将其插入数据库中吗

Table1

Col1       Col2                                            COL3             Col4

4567       Test Assembling the Plant(4566)  [2]            WAST              Testing1
ERTW       Test the second assembly           [3]          JEST              Test4
V345       This is another test (FAR X) [9]                KERT              Test192
--         This is test Number 10       [6]
       Number of the testing assembly  (1234)                            Test the plant assembly 
V234       Testing the WIRE ASSEMBLY                                         Test this assembly   (12345-7876544)  [9]
C34567     This is another test assembly   (123456) [6]    trew43            This is test assembly 
RT234      Testing the assembly1100                        PQR               Testing assembly
PL234      Test                                            RET

Is it possible to extract the above using Regular expression or based on column numbers. 是否可以使用正则表达式或基于列号提取以上内容。

any help will be greatly appreciated. 任何帮助将不胜感激。

Sounds like the problem is the multiple spaces in each data field. 听起来问题出在每个数据字段中有多个空格。 They look like they are either single spaces between words or multiple spaces before an opening ( or [. So I convert those to single space and then break the fields apart based on three or more spaces. I then use a field separator of " || " for clarity 它们看起来像是单词之间的单个空格,还是一个开孔前的多个空格(或[。因此,我将其转换为单个空格,然后基于三个或更多空格将字段分开。然后我使用字段分隔符“ || “为清楚起见

cat file1 file2 | perl -pe 's/\s+\(/ \(/g;s/\s+\[/ \[/g' | perl -pe 's/\s{3,}/ \|\| /g' | perl -pe 's/<CR>.*//'

Each line of output would look like this. 每行输出看起来像这样。 The ordering is just based on the cat. 订单仅基于猫。

  • 567 || 567 || Test Assembling the Plant(4566) [2] || 测试组装工厂(4566)[2] || WAST || 浪费|| Testing1 测试1
  • ERTW || ERTW || Test the second assembly [3] || 测试第二个组件[3] || JEST || 笑话|| Test4 测试4
  • V345 || V345 || This is another test (FAR X) [9] || 这是另一个测试(FAR X)[9] || KERT || KERT || Test192 测试192
  • -- || -|| This is test Number 10 [6] 这是测试编号10 [6]
  • || || Test100 测试100
  • || || Number of the testing assembly (1234) || 测试组件的数量(1234)|| Test the plant assembly 测试工厂组装
  • V234 || V234 || Testing the WIRE ASSEMBLY || 测试电线组件|| Test this assembly (12345-7876544) [9] 测试这个程序集(12345-7876544)[9]
  • C34567 || C34567 || This is another test assembly (123456) [6] || 这是另一个测试程序集(123456)[6] || trew43 || trew43 || This is test assembly -RT234 || 这是测试组件-RT234 || Testing the assembly1100 || 测试装配1100 || PQR || PQR || Testing assembly -PL234 || 测试组件-PL234 || Test || 测试|| RET RET

Yes you can use Regex to extract strings from big data. 是的,您可以使用Regex从大数据中提取字符串。 No problem for spaces with regex. 正则表达式的空格没问题。

\\s -> any space character \\ s->任何空格字符

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM