I have the following data:
<!-- subjects.xml -->
<Subjects>
<Subject>
<Id>1</Id>
<Name>Maths</Name>
</Subject>
<Subject>
<Id>2</Id>
<Name>Science</Name>
</Subject>
<Subject>
<Id>2</Id>
<Name>Advanced Science</Name>
</Subject>
<Subject>
<Id>3</Id>
<Name>History</Name>
</Subject>
</Subjects>
which is to be joined to:
<!-- courses.xml-->
<Courses>
<Course>
<SubjectId>1</SubjectId>
<Name>Algebra I</Name>
</Course>
<Course>
<SubjectId>1</SubjectId>
<Name>Algebra II</Name>
</Course>
<Course>
<SubjectId>1</SubjectId>
<Name>Percentages</Name>
</Course>
<Course>
<SubjectId>2</SubjectId>
<Name>Physics</Name>
</Course>
<Course>
<SubjectId>2</SubjectId>
<Name>Biology</Name>
</Course>
</Courses>
I wish to do a left join on the first table to the second table so as to get the following output:
<Results>
<Result>
<Table1>
<Subject>
<Id>1</Id>
<Name>Maths</Name>
</Subject>
</Table1>
<Table2>
<Course>
<SubjectId>1</SubjectId>
<Name>Algebra I</Name>
</Course>
<Course>
<SubjectId>1</SubjectId>
<Name>Algebra II</Name>
</Course>
<Course>
<SubjectId>1</SubjectId>
<Name>Percentages</Name>
</Course>
</Table2>
</Result>
<Result>
<Table1>
<!-- Notice there are 2 subjects here, as they both have the same ID-->
<Subject>
<Id>2</Id>
<Name>Science</Name>
</Subject>
<Subject>
<Id>2</Id>
<Name>Advanced Science</Name>
</Subject>
</Table1>
<Table2>
<Course>
<SubjectId>2</SubjectId>
<Name>Physics</Name>
</Course>
<Course>
<SubjectId>2</SubjectId>
<Name>Biology</Name>
</Course>
</Table2>
</Result>
<Result>
<Table1>
<Subject>
<Id>3</Id>
<Name>History</Name>
</Subject>
</Table1>
<Table2>
<!-- Notice this section is empty -->
</Table2>
</Result>
</Results>
I have the following code to do this:
<Results>
{
(: For each element in courses, where it's 'SubjectId' exists in "subjects.xml":)
for $e2 in doc("courses.xml")/Courses/Course
let $foriegnId := $e2/SubjectId
group by $foriegnId
let $e1 := doc("subjects.xml")/Subjects/Subject[Id = $foriegnId]
where $e1
return
<Result>
<Table1>
{$e1}
</Table1>
<Table2>
{$e2}
</Table2>
</Result>
}
{
(: PART2 :)
(:Show the remaining elements in courses that have not yet been outputted:)
for $e1 in doc('subjects.xml')/Subjects/Subject
let $idVal := $e1/Id
group by $idVal
where not(doc('courses.xml')/Courses/Course/SubjectId = $idVal)
return
<Result>
<Table1>
{$e1}
</Table1>
<Table2/>
</Result>
}
</Results>
Note the code works fine and does the job. However, I have found that when executing the code for large inputs (750 Subjects, each with 120 courses as well as 100 Subjects without any Courses and 100 Courses without any Subjects), the script runs extremly slow!
What can I do to make my script faster? Is there a better way of doing this? What's the time complexity?
Update 2
It turns out I have heavily misidentified the problem. The problem was in fact very little to do with part 2 of the code but rather part 1 of the code.
What I did was:
for $e2 in doc("courses.xml")/Courses/Course
let $foriegnId := $e2/SubjectId
let $e1 := doc("subjects.xml")/Subjects/Subject[Id = $foriegnId]
group by $foriegnId
when what I should have done was:
for $e2 in doc("courses.xml")/Courses/Course
let $foriegnId := $e2/SubjectId
group by $foriegnId
let $e1 := doc("subjects.xml")/Subjects/Subject[Id = $foriegnId]
This reduced the time of the code from 30,000ms to around 4,000ms.
Further performance improvements are welcome.
Depending on how the query is optimized, the list of IDs might be put together again and again, once for each subject. Fetch the list once in advance, and subsequently verify against this.
let $subjectIds := doc('courses.xml')/Courses/Course/SubjectId
for $e1 in doc('subjects.xml')/Subjects/Subject
let $idVal := $e1/Id
group by $idVal
where not($subjectIds = $idVal)
return
<Result>
<Table1>
{$e1}
</Table1>
<Table2/>
</Result>
A further optimization might be to prune the list of partially redundant subject IDs to a sequence of their distinct values before:
let $subjectIds := distinct-values(doc('courses.xml')/Courses/Course/SubjectId)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.