简体   繁体   中英

xpath - finding text within a node - matching the whole word

Here's an example of the XML file output - obviously scaled down and some information changed:

<directory>
    <company>
        <id>5002</id>
        <company_name>Clothing Store</company_name>
        <address>123 street</address>
        <latitude>51.123456</latitude>
        <longitude>-113.12345432</longitude>
        <phone>1234567890</phone>
        <fax/>
        <website/>
        <logo_url/>
        <categories>
            <category>
                <name>Retail Fashion</name>
                <sub_categories>
                    <category_sub>
                        <id>5056</id>
                        <name>Her Style / Ladies Wear</name>
                    </category_sub>
                </sub_categories>
            </category>
        </categories>
    </company>
    <company>
        <id>5003</id>
        <company_name>Hardware Store</company_name>
        <address>123 street</address>
        <latitude>51.123456</latitude>
        <longitude>-113.12345432</longitude>
        <phone>1234567890</phone>
        <fax/>
        <website/>
        <logo_url/>
        <categories>
            <category>
                <name>Retail</name>
                <sub_categories>
                    <category_sub>
                        <id>5001</id>
                        <name>Hardware</name>
                    </category_sub>
                </sub_categories>
            </category>
        </categories>
    </company>
    <company>...</company>
</directory>

So, here's the issue. I've got an xml file for a business directory. I need to do text searches on it based on categories and pull only the businesses that have those categories within their <company> node. So, let's say I search for "Retail", I would need any business that has "Retail" as a category, but I need to return all the child nodes within the <company> node - so, everything from <id> to <categories>

I actually have everything working properly, except when more than one category share a specific word. So, the current example I have right now is "Retail" - there is a category for "Retail" and "Retail Fashion" - the way I'm doing my xpath it's pulling in all the businesses from both categories because it's not looking for the whole word, or doing any regex searches. I have a feeling that I need to use matches but have yet to be successful in implementing it correctly. I'm a total xpath noob - I'm sure this is an easy answer, but I can't find a good example for what I'm trying to do anywhere - or at least one that works for me.

Here's what I'm doing for the xpath:

$results = $xml->xpath("//company[contains(categories/*,'Retail')]");

Like I said, this returns everything as it should, except it's including both "Retail" and "Retail Fashion" categories.

Like I already tried to explain in the comment, you can formulate a predicate not with contains() (as it will search inside a whole node-value) but also just with string-comparison against a concrete node value.

Example ( Demo ):

$xml = simplexml_load_string($buffer);

$expression = "//company[categories//*[. = 'Retail']]";

$result = $xml->xpath($expression);

foreach ($result as $index => $element)
{
    echo '#', $index, ': ', $element->asXML(), "\n";
}

This does compare against concrete child-nodes:

//company[categories//*[. = 'Retail']]
                    ^^

Because the <name> element value you're most likely looking for is a child of either <category> or <category_sub> (which actually is wrong in XML, you have a tree, it's clear that it is sub, you do not need to differ by the element name - but that's just a note in the margin).

Take care, if you are working with input data as search terms and read this blog-post of mine:

It also points to related Stackoverflow Q&A materials about that topic.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM