来自xml解析的awk语句中的substr

Question

Link to the original question: bash script extract XML data into column format and now for a modification and explanation -> 链接到原始问题： bash脚本将XML数据提取为列格式，现在进行修改和解释 - >

Something within this line of code is not correct and I believe it is with the substr portion and that would be because I don't have a full understanding and would like to learn HOW better to understand it. 这行代码中的某些内容是不正确的，我相信它与substr部分有关，那是因为我没有完全理解，并且想学习如何更好地理解它。 Yes I have looked at documentation and its not fully clicking. 是的，我查看了文档，但没有完全点击。 A couple examples as well as an answer would really be helpful. 几个例子和一个答案真的会有所帮助。

awk -F'[<>]' 'BEGIN{a["STKPR"]="Prod";a["STKSVBLKU"]="Prod";a["STKSVBLOCK"]="Prod";a["STKSVBLK2"]="Test";} /Name/{name=$3; type=a[substr(name,length(name))]; if (length(type)==0) type="Test";} /SessionHost/+/Host/{print type, name, $3;}'|sort -u

This bit here: 这一点在这里：

type=a[substr(name,length(name))]; if (length(type)==0) type="Test";

Here is the xml format which each bit is a block for each host that contains the hostname and IP. 这是xml格式，每个位是包含主机名和IP的每个主机的块。

<?xml version="1.0"?>
<Connection>
  <ConnectionType>Putty</ConnectionType>
  <CreatedBy>Someone</CreatedBy>
  <CreationDateTime>2014-10-27T11:53:32.0157492-04:00</CreationDateTime>
  <CredentialConnectionID>9F3C3BCF-068A-4927-B996-CA52154CAE3B</CredentialConnectionID>
  <Description>Red Hat Enterprise Linux 5 (64-bit)</Description>
  <Events>
    <OpenCommentPrompt>true</OpenCommentPrompt>
    <WarnIfAlreadyOpened>true</WarnIfAlreadyOpened>
  </Events>
  <Group>PATH/TO/GROUP/NAME</Group>
  <ID>f2007f03-3b33-47d3-8335-ffd84ccc0e6b</ID>
  <MetaInformation />
  <Name>STKSPRDAPP01111</Name>
  <OpenEmbedded>true</OpenEmbedded>
  <PinEmbeddedMode>False</PinEmbeddedMode>
  <Putty>
    <AlwaysAskForPassword>true</AlwaysAskForPassword>
    <Domain>DOMAIN</Domain>
    <FontSize>12</FontSize>
    <Host>10.0.0.111</Host>
    <Port>22</Port>
    <PortFowardingArray />
    <TelnetEncoding>IBM437</TelnetEncoding>
  </Putty>
  <Stamp>85407098-127d-4d3c-b7fa-8f174cb1e3bd</Stamp>
  <SubMode>2</SubMode>
  <TemplateName>SSH-PerUserCreds</TemplateName>
</Connection>

What I want to do is similar to the referenced link above. 我想要做的是类似于上面引用的链接。 But here I want to match --> 但在这里我要匹配 - >

BEGIN{a["STKPR"]="Prod";a["STKSVBLKU"]="Prod";a["STKSVBLOCK"]="Prod";a["STKSVBLK2"]="Test";

and all of the rest as Test. 所有其余的作为测试。 Best to read the previous post to help make this one more understandable. 最好阅读上一篇文章，以帮助使这一点更容易理解。 Thank you. 谢谢。

Answer 1

Because your keys here are of different length, the substr approach is less than optimal. 因为这里的密钥长度不同，所以substr方法不是最优的。 Try: 尝试：

awk -F'[<>]' '/Name/{n=$3;t="Test"; if(n ~ /^STKPR/) t="Prod"; if (n ~/^STKSVBLKU/) t="Prod"; if (n ~/^STKSVBLOCK/) t="Prod"} /SessionHost/+/Host/{print t, n, $3;}' sample.xml |sort -u
Test STKSPRDAPP01111 10.0.0.111

How It Works 这个怎么运作

In this case, the type, denoted by t , is set according to a series of if statements. 在这种情况下，由t表示的类型是根据一系列if语句设置的。 From the above code, they are: 从上面的代码中，它们是：

t="Test"
if (n ~ /^STKPR/) t="Prod"
if (n ~ /^STKSVBLKU/) t="Prod" 
if (n ~ /^STKSVBLOCK/) t="Prod"

By setting t="Test" , Test becomes the default: the type will be Test unless another statement matches. 通过设置t="Test" ， Test成为默认值：除非另一个语句匹配，否则类型将为Test 。 If of the following statements looks at the string that begins the host name and, if there is a match, sets type t to a new value. 如果以下语句查看以主机名开头的字符串，并且如果匹配，则将类型t设置为新值。 (When a regular expression begins with ^ , that means that what follows must match at the beginning of the string.) （当正则表达式以^开头时，表示后面的内容必须在字符串的开头匹配。）

Alternative using fancier regular expressions 替代使用发烧友正则表达式

Since the above three if statements are all for the Prod type, the three of them could, if you preferred, be rearranged to: 由于以上三个if语句都是针对Prod类型的，如果您愿意，它们中的三个可以重新排列为：

t="Test"
if (n ~ /^STK(PR|SVBLKU|SVBLOCK)/) t="Prod"

(metalcated: Fixed unmatched parentheses bracket) （金属化：固定不匹配的括号括号）

Answer 2

The substr portion produces a string containing the last character of the string. substr部分生成一个包含字符串最后一个字符的字符串。 This is because it is taking a substring of string name starting at the position length(name) going to the end of the string, and because substr is indexed starting at 1. 这是因为它从位置length(name)开始到字符串末尾的字符串name的子字符串，因为substr从1开始索引。

To match whole strings you can use your variable name rather than processing it with substr . 要匹配整个字符串，您可以使用变量name而不是使用substr处理它。

/Name/ { name=$3; type=a[name]; if (length(type)==0) type="Test"; }

来自xml解析的awk语句中的substr

问题描述

2 个解决方案

解决方案1
2 已采纳 2014-11-02 00:40:33

How It Works 这个怎么运作

Alternative using fancier regular expressions 替代使用发烧友正则表达式

解决方案2
1 2014-11-02 00:37:05

来自xml解析的awk语句中的substr

问题描述

2 个解决方案

解决方案1 2 已采纳 2014-11-02 00:40:33

How It Works 这个怎么运作

Alternative using fancier regular expressions 替代使用发烧友正则表达式

解决方案2 1 2014-11-02 00:37:05

解决方案1
2 已采纳 2014-11-02 00:40:33

解决方案2
1 2014-11-02 00:37:05