简体   繁体   中英

Problem with exercise with Xquery on Basex

I really need your help with a query in BaseX. The problem is that I really do not understand the logic behind this language which is Xquery. So I have this first exercise and it is asking me:

"Find the first symptom(s) appearing after June 5, 2012. Report the result in a document having root SYMSAFTER, containing elements SYM."

The database is like that

<?xml version="1.0"?>

<PATIENT_SYMS>
    <PATIENT>
        <NAME>Bob</NAME>
        <SYMOCC>
            <SYM>
                <INT>high</INT>
                <DESC>  edema </DESC>
            </SYM>      
        </SYMOCC>   
    </PATIENT>
    <PATIENT>
        <NAME>Ann</NAME>
        <SYMOCC>
            <DATE>2015-08-03</DATE>
            <SYM>
                <INT>low</INT>
                <DESC>  asthma </DESC>
            </SYM>
        </SYMOCC>
        <SYMOCC>
            <DATE>2017-05-03</DATE>
            <SYM>
                <INT> high </INT>
                <DESC> nausea </DESC>
            </SYM>
        </SYMOCC>
    </PATIENT>
    <PATIENT>
        <NAME> Tom </NAME>
        <SYMOCC>
            <DATE>2011-01-01</DATE>
            <SYM>
                <INT>high</INT>
                <DESC>  headache </DESC>
            </SYM>  
            <SYM>
                <INT> low </INT>
                <DESC> nausea </DESC>
            </SYM>
        </SYMOCC>
    </PATIENT>
    <PATIENT>
        <NAME>Sue</NAME>    
    </PATIENT>
</PATIENT_SYMS>

The answer to the question is the following:

<SYMSAFTER> {
for $s in doc('Ps.xml')//SYMOCC
where $s/DATE > '2012-06-05' and (every $s1 in doc('Ps.xml')//SYMOCC satisfies not($s1/DATE > '2012-06-05') or $s1/DATE >= $s/DATE)
return $s
}
</SYMSAFTER>

The output will be:

<SYMSAFTER>
  <SYMOCC>
    <DATE>2015-08-03</DATE>
    <SYM>
      <INT>low</INT>
      <DESC>asthma</DESC>
    </SYM>
  </SYMOCC>
</SYMSAFTER>

I honestly don't understand the logic behind that.

  1. How instructions are executed in this language? Is it comparing every single date in $s with any other date in s1? Is there any order it follows?
  2. How does satisfies/satisfies-not work? Because in this case to understand what is going on I thought: "well, if
satisfies not($s1/DATE > 2012-06-05) 

why this one down below it is actually not working?

satisfies ($s1/DATE < 2012-06-05)

isn't it the exact same thing?

  1. Why is the last part "OR" and not "AND". I got we're checking if the first date is actually the first by checking if there isn't another date before that date but shouldn't it be "AND"?

  2. Why in this line

$s1/DATE >= $s/DATE

we put greater equal (and not just greater)? isn't it obvious that it is going to find the same date equal to the one on $s?

As you can imagine I'm a bit confuse about this, but online informations are really poor and I had no idea on what I need to do. Thank you!

Learning any language from online resources alone can be very tough. There's so much information, but it is typically of very mixed quality, and most of it's written in an hour or two with very little design or review. Get yourself a good old-fashioned book, like Priscilla Walmsley's - you know that's written by an expert, who has spent months thinking carefully about how to present information in a logical sequence, and it will have been carefully reviewed by others.

Now let's look at this example query.

for $s in doc('Ps.xml')//SYMOCC
where $s/DATE > '2012-06-05' 
    and (every $s1 in doc('Ps.xml')//SYMOCC 
            satisfies not($s1/DATE > '2012-06-05') 
                      or $s1/DATE >= $s/DATE)
return $s

I actually think this is a very poor answer to the question, but let's analyse what it means.

Firstly, you have to know the language pretty well to know the precedence of the operators, specifically, whether the "or xxxx" clause is part of the "satisfies" condition or not. In fact it is, as I have tried to show in my indentation - but it would be better to use parentheses to make it clear.

The query is looking for dates in doc('Ps.xml')//SYMOCC that satisfy two conditions: (a) the date D must be after 2012-06-05, and (b) every date in the document must either be before 2012-06-05, or >= D. Those two conditions correspond to the conditions in the requirement that (a) the date must be after 2012-06-05, and (b) it must be earlier than any other date.

Let's try and answer your questions:

  1. How instructions are executed in this language? Is it comparing every single date in $s with any other date in s1? Is there any order it follows?

It's not an imperative, procedural language, it's a declarative language. It doesn't have instructions, and they aren't executed. It's a logic-based declarative language where you say what conditions the answer must satisfy, and the system works out how to get that answer. Different implementations will do it quite differently depending on their optimization strategy.

  1. The difference between DATE < XXX and not(DATE >= XXX) arises when there is no DATE (some of the SYMOCC elements do not have a DATE child). If there is no DATE, then DATE < XXX and DATE >= XXX are both false.

  2. Why is it OR rather than AND? Well, I think the way the query is expressed is a little perverse, but given the approach taken, it's correct. The date D we're looking for is the first one after 2012-06-05 if every other date is either (a) earlier than 2012-06-05, or (b) later than D.

  3. Why is the final condition >= rather than > ? Because there can be multiple symptoms appearing on the same date. If you wrote > , then you'd get no results in the event of duplicates.

Most of your questions seem to be less a problem with XQuery notation, and more a lack of understanding of how predicate logic works. But having said that, I would have produced a different solution to this problem. I would start by sorting all the events by date, then removing those before 2012-06-05, then removing those after the first date in the sequence. That would be something like

let $selected :=
  for $s in doc('Ps.xml')//SYMOCC[DATE]
  where $s/DATE > '2012-06-05'
  order by $s/DATE
  return $s
return $selected[DATE = $selected[1]/DATE]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM