简体   繁体   English

更新来自多个不同来源的数据

[英]Updating data from several different sources

I'm in the process of setting up a database with customer information. 我正在建立包含客户信息的数据库。 The database will handle customer data (customer id, address, phonenr etc.) as well as some basic information about which kind of advertisement a specific customer has been subjected to, and how they reacted to it. 该数据库将处理客户数据(客户ID,地址,电话等),以及一些有关特定客户所接受的广告类型以及他们如何反应的基本信息。

The data will be maintained both from a central data-warehouse, but additional information about customers and the advertisement will also be updated from other sources. 数据将通过中央数据仓库进行维护,但是有关客户和广告的其他信息也将从其他来源进行更新。 For example, if an external advertisement agency runs a campaign, I want them to be able to feed back data about OptOuts, e-mail bounces etc. I guess what I need is an API which can be easily handed out to any number of agencies. 例如,如果外部广告代理商开展了一项活动,我希望他们能够反馈有关OptOuts,电子邮件退回等的数据。我想我需要的是可以轻松分发给任意数量的代理商的API 。

My first thought was to set up a web service API for all external sources, but since we'll probably be talking large amounts of data (millions of records per batch) I'm not sure a web service is the best option. 我的第一个想法是为所有外部资源建立一个Web服务API,但是由于我们可能会谈论大量数据(每批数以百万计的记录),因此我不确定Web服务是最佳选择。

So my question is, what's the best practice here? 所以我的问题是,这里的最佳实践是什么? I need a solution simple enough for advertisement agencies (likely with moderately skilled IT-people) to make use of. 我需要一个足够简单的解决方案,以便广告代理商(可能需要具有中等技能的IT人员)来使用。 Simplicity is of the essence – by which I mean “simplicity over performance” in this case. 简单是至关重要的–在这种情况下,我的意思是“对性能的追求”。 If the set up gets too complex, it won't work. 如果设置太复杂,它将无法正常工作。

The system will very likely be based on Microsoft technology. 该系统很可能将基于Microsoft技术。

Any suggestions? 有什么建议么?

The process you're describing is commonly referred to as Data Integration using ETL processes. 您所描述的过程通常称为使用ETL过程的数据集成。 ETL stands for Extract-Transform-Load. ETL代表Extract-Transform-Load。 The idea is to build up your central data warehouse by extracting information from a lot of different data-sources, transform it and then load it into your data warehouse. 这个想法是通过从许多不同的数据源中提取信息来构建中央数据仓库,然后对其进行转换,然后将其加载到数据仓库中。

A variety of (also graphical) tools exist to implement such a process. 存在各种各样的(也是图形的)工具来实现这样的过程。 Since you said you'll probably running a Microsoft stack, I suggest having a look at Sql Server Integration Services (SSIS). 由于您说过您可能会运行Microsoft堆栈,因此建议您看一下Sql Server Integration Services(SSIS)。

Regarding your suggestion to implement integration using a web-service, I don't think that's a good idea too. 关于您建议使用Web服务实现集成的建议,我也不认为这是个好主意。 Similarily, I don't think shifting the burden of data integration to your customers is a good idea either. 同样,我也不认为将数据集成的负担转移给您的客户是一个好主意。 You should agree with your customers on some form of a data exchange format, it could be as simple as a CSV file, or XML, Excel sheets, Access databases, use whatever suits your needs. 您应该在某种形式的数据交换格式上与您的客户达成一致,它可以像CSV文件一样简单,或者可以是XML,Excel工作表,Access数据库,可以使用满足您需要的任何格式。

Any modern ETL tool like SSIS is capable of working with those different data sources. 任何像SSIS这样的现代ETL工具都可以使用这些不同的数据源。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM