在这种情况下，如何从linux中的文件中的字符串中提取数据？

Question

I have a file with lines that look like this: 我有一个文件，其行如下所示：

   IBACS6XX P24 ( .PADM(TEST_3), .QC(P1_87P_Z_3) );
   OBAXXCSXX08 P77 ( .A(P1_158P_N1_PROBE_SEL), .PADM(N1_SELECT) );
   inv0_p U99 ( .A(P1_P1_2P_P1_P1_19P_Z_0), .Q(n00) );
   IBACS6XX P25 ( .PADM(TBUSREQN), .QC(tbusreqn) );
   IBACS6XX P26 ( .PADM(NX_N2N), .QC(P1_177P_Z_0) );
   OBAXXCSXX08 P27 ( .A(P1_158P_N2G6PC), .PADM(N2G6PCC) );
   OBAXXCSXX08 P28 ( .A(P1_158P_N1G6PC), .PADM(N1G6PCC) );
   IOACS3P6CSXE04 P46 ( .A(P1_158P_DOUT_7), .EN(FE_OFN21_P1_158P_DATA_OUTN), 
      .PADM(DATA_7), .MA(LTIEHI_5_NET), .MB(P1_87P_Z_0_INV), .QC(P1_49P_ZI_7) );
   IOACS3P6CSXE04 P47 ( .A(P1_158P_DOUT_6), .EN(FE_OFN21_P1_158P_DATA_OUTN), 
      .PADM(DATA_6), .MA(LTIEHI_5_NET), .MB(P1_87P_Z_0_INV), .QC(P1_49P_ZI_6) );

Now to the question, I wish to extract 3 items of data and put them into a new file separated by space character 现在问题，我希望提取3个数据项并将它们放入一个由空格字符分隔的新文件中

(1) The first item eg IBACS3XX in the first line （1）第一项，例如第一行中的IBACS3XX

(2) the second item that starts with P followed by 2 digits and is usually 3 characters long. （2）第二个项目以P开头，后跟2个数字，通常为3个字符。 After the second item we always get an opening bracket. 在第二项后，我们总是得到一个开放式支架。 eg P24 in the first line 例如第一行中的P24

(3) and the item between .PADM( and the closing bracket ) eg TEST_3 in the first line （3）和.PADM（和结束括号）之间的项目，例如第一行中的TEST_3

How do I do this in Linux? 我如何在Linux中执行此操作？ Do you have a better way? 你有更好的方法吗？

The problems are: 问题是：

(1) Some line are broken into two lines and thus the .PADM( may end up in the second line instead as can be seen in the last 2 examples. （1）一些线被分成两条线，因此.PADM（可能最终在第二行中，而在最后两个例子中可以看到）。

(2) The .PADM( does not always apppear at the same place in the line as can be seen in the second example. （2）.PADM（并不总是在第二个例子中可以看到的行中的相同位置。

(3) All lines are not of interest, only those lines that start with IBA OBA or IOA as can be seen above. （3）所有线路都不感兴趣，只有那些以IBA OBA或IOA开头的线路如上所示。 If a line does not start with these characters than it can be ignored. 如果一行不以这些字符开头，则可以忽略。 This is a portion of a netlist file. 这是网表文件的一部分。

All lines are "closed" with ';' 所有行都用';'“关闭” symbol, otherwise they continue to the next line in text file. 符号，否则它们继续到文本文件的下一行。

I assume that awk and sed is to be used in some combination but not sure how. 我假设awk和sed将在某种组合中使用，但不确定如何。

EDIT: 编辑：

It works perfectly, now a small step is to filter out these from the netlist as well: 它工作得很好，现在一小步也是从网表中过滤掉这些：

 ggppxbp P74 (  );
 ggppxbp P74VDD (  );
 ggppxbg P75 (  );
 ggppxbg P75VSS (  );

I just want to discard the last braket and semicolon. 我只是想丢弃最后的braket和分号。 These cells always start with ggppxb and the last letter tells whether it is 5v or GND connection thus the last letter only will change. 这些单元始终以ggppxb开头，最后一个字母表示它是5v还是GND连接，因此最后一个字母只会改变。

I think that I can put ggppxbp into the if statement after the || 我想我可以把ggppxbp放到||之后的if语句中 symbol. 符号。 But, how do I discard the bracket and the semicolon and include the remaining two items into the output file? 但是，如何丢弃括号和分号并将剩余的两个项目包含在输出文件中？

Answer 1

Try this awk program. 试试这个awk程序。 It assumes that there is at most one continuation line but can be changed to handle more if needed by replacing the first if with a while I guess. 它假定最多有一个续行，但可以改变，如果需要通过更换第一，以处理更多if有while我猜。

{
    if (! /;/ ) {
        L=$0
        getline
        $0=L $0
    }
    if ($1 ~ /^IBA/ || $1 ~ /^OBA/ || $1 ~ /^IOA/) {
        A=$1
        B=$2
        gsub(".*PADM\\(","")
        gsub("\\).*","")
        print A,B,$0
    }
}

To handle the additional items try: 要处理其他项目，请尝试：

{
    if (! /;/ ) {
        L=$0
        getline
        $0=L $0
    }
    print NR,$0
    if ($1 ~ /^IBA/ || $1 ~ /^OBA/ || $1 ~ /^IOA/ || $1 ~ /^ggppxb/ ) {
        A=$1
        B=$2
        gsub(".*PADM\\(","")
        gsub("\\).*","")
        gsub("\\(.*","")
        print A,B,$0
    }
}

If you want to learn more about awk, read the wonderful book Gawk: Effective AWK Programming . 如果你想了解更多关于awk的知识，请阅读精彩的书Gawk：Effective AWK Programming 。

Answer 2

sed -n '
/^[[:blank:]]*OBA[A-Z0-9]\{5\}/ b treat
/^[[:blank:]]*IBA[A-Z0-9]\{5\}/ b treat
/^[[:blank:]]*IOA[A-Z0-9]\{5\}/ b treat
b

: treat
   {
   s/[^;][[:blank:]]*$/&/
   t full
   N
: full
   s/^[[:blank:]]*\([A-Z0-9]\{8\}\)[[:blank:]]*\(P[0-9]\{2\}\).*[.]PADM(\([^)]*\)).*/\1 \2 \3/p
   }' YourFile

Generic for OBA, IBA, IOA OBA，IBA，IOA的通用

在这种情况下，如何从linux中的文件中的字符串中提取数据？

问题描述

2 个解决方案

解决方案1
3 已采纳 2013-12-05 10:41:09

解决方案2
1 2013-12-05 13:02:18

在这种情况下，如何从linux中的文件中的字符串中提取数据？

问题描述

2 个解决方案

解决方案1 3 已采纳 2013-12-05 10:41:09

解决方案2 1 2013-12-05 13:02:18

解决方案1
3 已采纳 2013-12-05 10:41:09

解决方案2
1 2013-12-05 13:02:18