[英]Best Approach to Processing SQL Data problem
I have a Data intensive problem which requires a lot of massaging and data manipulation and I'm putting this out there to see if anyone has an idea as to how to approach it. 我有一个数据密集型问题,需要大量的按摩和数据操作,我把它放在那里,看看是否有人知道如何处理它。
In simplest form. 最简单的形式。 I have a lot of tables which can be joined together to give me a price listing for dentists and how much each charges for a procedure.
我有很多桌子可以连在一起给我一个牙医的价格清单以及每个手术费用。
so we have multiple tables that looks like this. 所以我们有多个看起来像这样的表。
Dentist | Procedure1 | Procedure2 | Procedure3 | .........| Procedure?
John | 500 | 342 | 434 | .........| 843
Dave | 343 | 434 | 322 | NULLs....|
Mary | 500 | 342 | 434 | .........| 843
Linda | 500 | 342 | Null | .........| 843
Dentists can have different number of procedures and different pricing for each procedures. 牙医可以为每个程序提供不同数量的程序和不同的定价。 But there are a lot of Dentists that have the same number of procedures and the same rates that goes with it.
但是,有很多牙医拥有相同数量的手术和相同的费率。 Internally, we create a unique ID for each of these so-called fee listings.
在内部,我们为每个所谓的费用列表创建一个唯一的ID。
like John would be 001, Dave would be 002, but Mary would be fee 001 and Linda would be 003 It's not so bad if I have to deal with this data once but these fee listings comes in flat files (csvs) which i basically have to DTS up to a SQL server to work with. 就像约翰将是001,戴夫将是002,但玛丽将是费用001而琳达将是003如果我必须处理这些数据一次但是这些费用列表来自平面文件(csvs),我基本上有到DTS一直到SQL服务器使用。 and they come on a monthly bases.
他们每月来一次。 The pricing could change from month to month for each dentist which then would put them in a different unique ID internally.
每个牙医的定价可能会逐月变化,然后在内部将它们放入不同的唯一ID中。
Can someone shed some light on as to how to best approach this problem so that it's most efficient to process on a monthly basis without having to do tons of data manipulation? 有人能否解释如何最好地解决这个问题,以便每月处理最有效,而无需进行大量数据操作?
any help would be great, thanks guys. 任何帮助都会很棒,谢谢你们。
You probably need to unpivot the data to normalize it - so that you end up with: 您可能需要将数据取消对数据进行标准化,以便最终得到:
Doctor: DoctorID, DoctorDetails...
FeeSchedule: DoctorID, ScheduleID, EffectiveDate, OtherDetailAtThisLevel...
FeeScheduleDetail: ScheduleID, ProcedureCode, Fee, OtherDetailAtThisLevel...
When the data comes in for a doctor, it is pivoted, a new schedule is created and the detail rows are created from the unpivoted data. 当数据进入医生时,它会被旋转,创建一个新的计划,并从不透明的数据创建详细信息行。
SSIS has an unpivot component which is fine - you would load the schedule first and then the detail. SSIS有一个univot组件很好 - 你先加载计划然后再加载细节。 If the format varies significantly, you might need a custom data source or just avoid SSIS.
如果格式差异很大,您可能需要自定义数据源或只是避免SSIS。
This system would keep track of new schedules for doctors. 该系统将跟踪医生的新时间表。 If the schedule is identical for a doctor, you could simply not insert it.
如果医生的时间表相同,您可能根本就不插入它。
If this logic is extensive, you could load the data to staging tables (SSIS or whatever) and do all this in SQL (T-SQL also has an UNPIVOT operator). 如果这个逻辑很广泛,你可以将数据加载到登台表(SSIS或其他),并在SQL中完成所有这些(T-SQL也有一个UNPIVOT运算符)。 That can have advantages in that the code is all in one place and can do all its operations in sets.
这可以具有以下优点:代码全部在一个地方并且可以在集合中执行其所有操作。
Regarding the zip codes, if the doctor doesn't have a fee, are these like usual and customary fee? 关于邮政编码,如果医生没有收费,这些是常规和惯常费用吗? This could simply be determined from the zip code of the doctor row.
这可以简单地从医生行的邮政编码确定。 In this case you have a few options.
在这种情况下,您有几个选择。 You can overlay the doctor fee schedule over a zip code fee schedule:
您可以通过邮政编码费用表覆盖医生费用表:
ZipCodeSchedule: ZipScheduleID, ZipCode, EffectiveDate
ZipCodeScheduleDetail: ZipScheduleID, ProcedureCode, Fee
Or you could save this in the regular feeschedule (potentially with some kind of flag that it was defaulted to the UCR). 或者你可以将它保存在常规费用计划中(可能带有某种标志,它默认为UCR)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.