简体   繁体   中英

simple XML question for perl - how to retrieve specific elements

I'm trying to figure out how to loop through XML but I've read a lot and I'm still getting stuck. Here's the info:

I'm using the wordnik api to retrieve XML with XML::Simple:

 $content = get($url);
 $r = $xml->XMLin("$content");

The actual XML looks like this:

<definitions>
−
<definition sequence="0" id="0">
−
<text>
To withdraw one's support or help from, especially in spite of duty, allegiance, or responsibility; desert:  abandon a friend in trouble. 
</text>
<headword>abandon</headword>
<partOfSpeech>verb-transitive</partOfSpeech>
</definition>
−
<definition sequence="1" id="0">
−
<text>
To give up by leaving or ceasing to operate or inhabit, especially as a result of danger or other impending threat:  abandoned the ship. 
</text>
<headword>abandon</headword>
<partOfSpeech>verb-transitive</partOfSpeech>
</definition>
−
<definition sequence="2" id="0">
−
<text>
To surrender one's claim to, right to, or interest in; give up entirely. See Synonyms at relinquish.
</text>
<headword>abandon</headword>
<partOfSpeech>verb-transitive</partOfSpeech>
</definition>
−
<definition sequence="3" id="0">

...

What I want is simply the FIRST definition's part of speech. I'm using this code but it's getting the LAST definition's POS:

    if($r->{definition}->{0}->{partOfSpeech}) {
      $pos = $r->{definition}->{0}->{partOfSpeech};
     }
else { $pos = $r->{definition}->{partOfSpeech}; }

I am pretty embarrassed by this since I know there's an obviously better way to do it. I would love to get something as simple as this working so I could more generally loop through the elements. BUt it just isn't working for me (no idea what to reference). I've tried many variations of the following - this is just my last attempt:

 while (my ($k, $v) = each %{$r->{definitions}->{definition}[0]->{sequence}->{partOfSpeech}}) {
  $v =~ s/'/'"'"'/g;
  $v = "'$v'";
  print "export $k=$v\n";
 }

Lastly, when I do "print Dumper($r)" it gives me this:

$VAR1 = {
          'definition' => {
                          '0' => {
                                 'partOfSpeech' => 'noun',
                                 'sequence' => '6',
                                 'text' => 'A complete surrender of inhibitions.',
                                 'headword' => 'abandon'
                               }
                        }
        };

(And that "noun" you see is the last (6th) definition/partofspeech element).


Based on RC's answer below, my new code looks like this:

$content = get($url);
$r = $xml->XMLin("$content", KeyAttr => { definition => 'sequence'});
while (my ($k, $v) = each %{$r->{definition}}) {
    $v=$r->{definition}->{$k}->{partOfSpeech};
    print "export $k=$v\n";
}

This prints out the following:

export 6='noun'
export 4='verb-transitive'
export 1='verb-transitive'
export 3='verb-transitive'
export 0='verb-transitive'
export 2='verb-transitive'
export 5='noun'

So this is good and it is exporting the correct pairs. But now the issue is that the order is off (which seems very likely to be Wordnik's problem and not a programming issue). How do I sort this by a key? Something like this?

sort($r->{definition});

From XML::Simple doc:

Note 1: The default value for 'KeyAttr' is ['name', 'key', 'id']. If you do not want folding on input or unfolding on output you must setting this option to an empty list to disable the feature.

I think adding KeyAttr => { definition => 'sequence' } to XMLin options might fix your issue.

It is also possible to use XML::Twig to traverse file for you and help extracting the data:

use XML::Twig;

my $content = do { local $/; <DATA> };      # get data

XML::Twig->new(twig_handlers => {
    definition => sub {
        warn "---\n",
            "sequence = ",     $_->att('sequence'), "\n",
            "text = ",         $_->first_child_trimmed_text('text'), "\n",
            "headword = ",     $_->first_child_trimmed_text('headword'), "\n",
            "partOfSpeech = ", $_->first_child_trimmed_text('partOfSpeech'), "\n";
        $_->purge;
    },
})->parsestring($content);

This is also more efficient, because whole structure does not have to be loaded in memory (the purge method is cleaning processed data for you).

您可以尝试WWW :: Wordnik :: API (我是作者。)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM