I need to gather a list of distinct employees from an XML file that contains a log of sales made by each employee. Unfortunately, the data in the XML file isn't exactly "consistent". The file is structured like so:
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345"
CustomerName="Bob" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345"
CustomerName="Pat" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="" EmployeeManagerId="12345"
CustomerName="Sally" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="" EmployeeManagerId="12345"
CustomerName="Sue" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId=""
CustomerName="Jack" SaleNumber="..." />
<Sale EmployeeId="58203" EmployeeName="Fred" EmployeeManagerId=""
CustomerName="Bill" SaleNumber="..." />
This XML file is uploaded to a web application, which passes its contents (as XML) to a stored procedure in SQL Server for processing. Because of the size of this file (up to 30,000 elements), I would like to do as little processing in the web application as possible.
The best solution I have come up with so far is to create a temporary table with one row for each distinct EmployeeId and ManagerId value. Then, for each row in the table, loop through the XML elements that have a matching EmployeeId until I find an entry where the name is not null (then repeat for ManagerId).
So, for each unique employee ID, I would be iterating over the results twice to see if I can find their name and manager's ID.
Once the file is processed, I would expect the Employee table to look like this:
+---------+------+------------+
| Id (PK) | Name | ManagerId |
+---------+------+------------+
| 12345 | NULL | NULL |
| 67890 | John | 12345 |
| 58203 | Fred | NULL |
+---------+------+------------+
Is there a more efficient (and less procedural) solution for this?
This gets the results, but may require some cleanup work if the sample data is different.
DECLARE @T TABLE ( x XML )
INSERT INTO @T
( x )
VALUES ( '<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345" CustomerName="Bob" SaleNumber="..." />' )
, ( '<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345" CustomerName="Pat" SaleNumber="..." />' ),
( '<Sale EmployeeId="67890" EmployeeName="" EmployeeManagerId="12345" CustomerName="Sally" SaleNumber="..." />' )
, ( '<Sale EmployeeId="67890" EmployeeName="" EmployeeManagerId="12345" CustomerName="Sue" SaleNumber="..." />' ),
( '<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="" CustomerName="Jack" SaleNumber="..." />' ),
( '<Sale EmployeeId="58203" EmployeeName="Fred" EmployeeManagerId="" CustomerName="Bill" SaleNumber="..." />' )
;WITH c
AS (
SELECT DISTINCT ID = x.value('(/Sale/@EmployeeId)[1]', 'int')
, NAME = x.value('(/Sale/@EmployeeName)[1]', 'varchar(4)')
, ManagerID = x.value('(/Sale/@EmployeeManagerId)[1]', 'int')
FROM @t
WHERE x.value('(/Sale/@EmployeeName)[1]', 'varchar(4)') <> ''
)
SELECT ID, NAME, ManagerID =MIN( NULLIF(ManagerID, 0))
FROM c
GROUP BY ID, Name
UNION
SELECT ManagerID, NULL, NULL
FROM c
WHERE ManagerID NOT IN (SELECT DISTINCT ID FROM c)
AND ManagerID <> 0
declare @xml xml = '
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345"
CustomerName="Bob" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345"
CustomerName="Pat" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="" EmployeeManagerId="12345"
CustomerName="Sally" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="" EmployeeManagerId="12345"
CustomerName="Sue" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId=""
CustomerName="Jack" SaleNumber="..." />
<Sale EmployeeId="58203" EmployeeName="Fred" EmployeeManagerId=""
CustomerName="Bill" SaleNumber="..." />'
-- "E1 is all employees"
;with E1 as
(
select T.N.value('@EmployeeId', 'int') as Id,
T.N.value('@EmployeeName', 'nvarchar(100)') as Name,
T.N.value('@EmployeeManagerId', 'int') as ManagerID
from @xml.nodes('/Sale') as T(N)
),
-- E2 groups on id to get only one emp for each id
E2 as
(
select Id, max(Name) as Name, nullif(max(ManagerID), 0) as ManagerID
from E1
group by Id
),
-- "All manager id's"
M as
(
select distinct T.N.value('@EmployeeManagerId', 'int') as Id
from @xml.nodes('/Sale') as T(N)
where T.N.value('@EmployeeManagerId', 'int') <> 0
)
-- "All unique employees"
select Id, Name, ManagerID
from E2
union all
-- "Add managers with a lookup against emp for name and manager id"
select M.Id, E2.Name, E2.ManagerID
from M
left outer join E2
on M.Id = E2.ID
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.