简体   繁体   English

perl的简单XML问题-如何检索特定元素

[英]simple XML question for perl - how to retrieve specific elements

I'm trying to figure out how to loop through XML but I've read a lot and I'm still getting stuck. 我试图找出如何遍历XML的方法,但我学到了很多东西,但仍然陷入困境。 Here's the info: 这是信息:

I'm using the wordnik api to retrieve XML with XML::Simple: 我正在使用wordnik api使用XML :: Simple检索XML:

 $content = get($url);
 $r = $xml->XMLin("$content");

The actual XML looks like this: 实际的XML如下所示:

<definitions>
−
<definition sequence="0" id="0">
−
<text>
To withdraw one's support or help from, especially in spite of duty, allegiance, or responsibility; desert:  abandon a friend in trouble. 
</text>
<headword>abandon</headword>
<partOfSpeech>verb-transitive</partOfSpeech>
</definition>
−
<definition sequence="1" id="0">
−
<text>
To give up by leaving or ceasing to operate or inhabit, especially as a result of danger or other impending threat:  abandoned the ship. 
</text>
<headword>abandon</headword>
<partOfSpeech>verb-transitive</partOfSpeech>
</definition>
−
<definition sequence="2" id="0">
−
<text>
To surrender one's claim to, right to, or interest in; give up entirely. See Synonyms at relinquish.
</text>
<headword>abandon</headword>
<partOfSpeech>verb-transitive</partOfSpeech>
</definition>
−
<definition sequence="3" id="0">

... ...

What I want is simply the FIRST definition's part of speech. 我想要的只是FIRST定义的一部分。 I'm using this code but it's getting the LAST definition's POS: 我正在使用此代码,但它得到的是LAST定义的POS:

    if($r->{definition}->{0}->{partOfSpeech}) {
      $pos = $r->{definition}->{0}->{partOfSpeech};
     }
else { $pos = $r->{definition}->{partOfSpeech}; }

I am pretty embarrassed by this since I know there's an obviously better way to do it. 我为此感到很尴尬,因为我知道有一种明显更好的方法。 I would love to get something as simple as this working so I could more generally loop through the elements. 我很想得到像这样简单的操作,因此我可以更广泛地遍历所有元素。 BUt it just isn't working for me (no idea what to reference). 但是,它对我不起作用(不知道要引用什么)。 I've tried many variations of the following - this is just my last attempt: 我尝试了以下多种变体-这只是我的最后一次尝试:

 while (my ($k, $v) = each %{$r->{definitions}->{definition}[0]->{sequence}->{partOfSpeech}}) {
  $v =~ s/'/'"'"'/g;
  $v = "'$v'";
  print "export $k=$v\n";
 }

Lastly, when I do "print Dumper($r)" it gives me this: 最后,当我执行“ print Dumper($ r)”时,它会显示以下信息:

$VAR1 = {
          'definition' => {
                          '0' => {
                                 'partOfSpeech' => 'noun',
                                 'sequence' => '6',
                                 'text' => 'A complete surrender of inhibitions.',
                                 'headword' => 'abandon'
                               }
                        }
        };

(And that "noun" you see is the last (6th) definition/partofspeech element). (并且您看到的“名词”是最后一个(第六个)定义/ partofspeech元素)。


Based on RC's answer below, my new code looks like this: 根据以下RC的答案,我的新代码如下所示:

$content = get($url);
$r = $xml->XMLin("$content", KeyAttr => { definition => 'sequence'});
while (my ($k, $v) = each %{$r->{definition}}) {
    $v=$r->{definition}->{$k}->{partOfSpeech};
    print "export $k=$v\n";
}

This prints out the following: 打印出以下内容:

export 6='noun'
export 4='verb-transitive'
export 1='verb-transitive'
export 3='verb-transitive'
export 0='verb-transitive'
export 2='verb-transitive'
export 5='noun'

So this is good and it is exporting the correct pairs. 因此,这很好,并且可以导出正确的对。 But now the issue is that the order is off (which seems very likely to be Wordnik's problem and not a programming issue). 但是现在的问题是订单被取消了(这很可能是Wordnik的问题,而不是编程问题)。 How do I sort this by a key? 如何按键排序? Something like this? 像这样吗

sort($r->{definition});

From XML::Simple doc: 来自XML :: Simple doc:

Note 1: The default value for 'KeyAttr' is ['name', 'key', 'id']. 注1:'KeyAttr'的默认值为['name','key','id']。 If you do not want folding on input or unfolding on output you must setting this option to an empty list to disable the feature. 如果您不想在输入上折叠或在输出上展开,则必须将此选项设置为空列表以禁用该功能。

I think adding KeyAttr => { definition => 'sequence' } to XMLin options might fix your issue. 我认为在XMLin选项中添加KeyAttr => { definition => 'sequence' }可能会解决您的问题。

It is also possible to use XML::Twig to traverse file for you and help extracting the data: 也可以使用XML :: Twig为您遍历文件并帮助提取数据:

use XML::Twig;

my $content = do { local $/; <DATA> };      # get data

XML::Twig->new(twig_handlers => {
    definition => sub {
        warn "---\n",
            "sequence = ",     $_->att('sequence'), "\n",
            "text = ",         $_->first_child_trimmed_text('text'), "\n",
            "headword = ",     $_->first_child_trimmed_text('headword'), "\n",
            "partOfSpeech = ", $_->first_child_trimmed_text('partOfSpeech'), "\n";
        $_->purge;
    },
})->parsestring($content);

This is also more efficient, because whole structure does not have to be loaded in memory (the purge method is cleaning processed data for you). 这也更加有效,因为不必将整个结构都加载到内存中( purge方法是为您purge已处理的数据)。

您可以尝试WWW :: Wordnik :: API (我是作者。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM