简体   繁体   English

处理SQL数据问题的最佳方法

[英]Best Approach to Processing SQL Data problem

I have a Data intensive problem which requires a lot of massaging and data manipulation and I'm putting this out there to see if anyone has an idea as to how to approach it. 我有一个数据密集型问题,需要大量的按摩和数据操作,我把它放在那里,看看是否有人知道如何处理它。

In simplest form. 最简单的形式。 I have a lot of tables which can be joined together to give me a price listing for dentists and how much each charges for a procedure. 我有很多桌子可以连在一起给我一个牙医的价格清单以及每个手术费用。

so we have multiple tables that looks like this. 所以我们有多个看起来像这样的表。

Dentist | Procedure1 | Procedure2 | Procedure3 | .........| Procedure?
John    | 500        | 342        | 434        | .........| 843
Dave    | 343        | 434        | 322        | NULLs....|
Mary    | 500        | 342        | 434        | .........| 843
Linda   | 500        | 342        | Null       | .........| 843

Dentists can have different number of procedures and different pricing for each procedures. 牙医可以为每个程序提供不同数量的程序和不同的定价。 But there are a lot of Dentists that have the same number of procedures and the same rates that goes with it. 但是,有很多牙医拥有相同数量的手术和相同的费率。 Internally, we create a unique ID for each of these so-called fee listings. 在内部,我们为每个所谓的费用列表创建一个唯一的ID。

like John would be 001, Dave would be 002, but Mary would be fee 001 and Linda would be 003 It's not so bad if I have to deal with this data once but these fee listings comes in flat files (csvs) which i basically have to DTS up to a SQL server to work with. 就像约翰将是001,戴夫将是002,但玛丽将是费用001而琳达将是003如果我必须处理这些数据一次但是这些费用列表来自平面文件(csvs),我基本上有到DTS一直到SQL服务器使用。 and they come on a monthly bases. 他们每月来一次。 The pricing could change from month to month for each dentist which then would put them in a different unique ID internally. 每个牙医的定价可能会逐月变化,然后在内部将它们放入不同的唯一ID中。

Can someone shed some light on as to how to best approach this problem so that it's most efficient to process on a monthly basis without having to do tons of data manipulation? 有人能否解释如何最好地解决这个问题,以便每月处理最有效,而无需进行大量数据操作?

  1. what's the best approach to finding out the duplicates of the fee listings? 找出费用清单重复的最佳方法是什么?
  2. How do i keep track of updating a Dentist's fee listing incase they change their rates the next month? 我如何跟踪更新牙医的费用列表,以便他们在下个月更改费率? if Mary decides to charge a different fee for procedure2, then she would have a different unique ID internally. 如果Mary决定对procedure2收取不同的费用,那么她将在内部拥有不同的唯一ID。 how do i keep track of that on a monthly bases without having to delete everything and re-insert? 如何在每月的基础上跟踪,而不必删除所有内容并重新插入?
  3. There are a few million fee listings that I'm working with and some have standard rules that are based on zipcodes and some are just unique fee listings, what's the approach here? 我正在使用几百万的费用清单,有些基于zipcodes的标准规则,有些只是独特的费用列表,这里的方法是什么?
  4. I can write some kind of ad-hoc .net program to work with it but it's a lot of data and working straight in SQL server would be easier for me. 我可以编写某种特殊的.net程序来处理它,但是它有很多数据并且在SQL服务器中直接工作对我来说会更容易。

any help would be great, thanks guys. 任何帮助都会很棒,谢谢你们。

You probably need to unpivot the data to normalize it - so that you end up with: 您可能需要将数据取消对数据进行标准化,以便最终得到:

Doctor: DoctorID, DoctorDetails...
FeeSchedule: DoctorID, ScheduleID, EffectiveDate, OtherDetailAtThisLevel...
FeeScheduleDetail: ScheduleID, ProcedureCode, Fee, OtherDetailAtThisLevel...

When the data comes in for a doctor, it is pivoted, a new schedule is created and the detail rows are created from the unpivoted data. 当数据进入医生时,它会被旋转,创建一个新的计划,并从不透明的数据创建详细信息行。

SSIS has an unpivot component which is fine - you would load the schedule first and then the detail. SSIS有一个univot组件很好 - 你先加载计划然后再加载细节。 If the format varies significantly, you might need a custom data source or just avoid SSIS. 如果格式差异很大,您可能需要自定义数据源或只是避免SSIS。

This system would keep track of new schedules for doctors. 该系统将跟踪医生的新时间表。 If the schedule is identical for a doctor, you could simply not insert it. 如果医生的时间表相同,您可能根本就不插入它。

If this logic is extensive, you could load the data to staging tables (SSIS or whatever) and do all this in SQL (T-SQL also has an UNPIVOT operator). 如果这个逻辑很广泛,你可以将数据加载到登台表(SSIS或其他),并在SQL中完成所有这些(T-SQL也有一个UNPIVOT运算符)。 That can have advantages in that the code is all in one place and can do all its operations in sets. 这可以具有以下优点:代码全部在一个地方并且可以在集合中执行其所有操作。

Regarding the zip codes, if the doctor doesn't have a fee, are these like usual and customary fee? 关于邮政编码,如果医生没有收费,这些是常规和惯常费用吗? This could simply be determined from the zip code of the doctor row. 这可以简单地从医生行的邮政编码确定。 In this case you have a few options. 在这种情况下,您有几个选择。 You can overlay the doctor fee schedule over a zip code fee schedule: 您可以通过邮政编码费用表覆盖医生费用表:

ZipCodeSchedule: ZipScheduleID, ZipCode, EffectiveDate
ZipCodeScheduleDetail: ZipScheduleID, ProcedureCode, Fee

Or you could save this in the regular feeschedule (potentially with some kind of flag that it was defaulted to the UCR). 或者你可以将它保存在常规费用计划中(可能带有某种标志,它默认为UCR)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM