简体繁体 English

在本地缓存XML提要的最佳方法是什么？

[英]What is the best way to cache XML feeds locally?

原文 2010-08-10 08:58:36 2 2 .net/ xml/ caching

I have a XML feed which contains 1000+ records of properties (rent, sale). 我有一个XML feed，其中包含1000多个属性记录（出租，出售）。

Currently I am calling this feed 16x on homepage, always returning only 3 properties for specific criteria like 3 new house, 3 new flats, etc, 5 recommended house, 5 recommended flats etc. 目前，我在首页上将此Feed称为16x，始终只返回3个属性以符合特定条件，例如3栋新房子，3栋新公寓等，5栋推荐房子，5栋推荐公寓等。

This scenario was working well for 7 months whilst there was 200+ properties and only 100-200 views a day. 该方案在过去7个月中一直运作良好，而每天有200多个媒体资源和仅100-200次观看。 It is now getting to stage where I have 700+ visits a day and over 1000+ properties and downloading 16 feeds separately just to show homepage is getting slower and traffic is getting massively larger. 现在，我每天要访问700多个站点，超过1000多个属性，并分别下载16个供稿，这只是为了表明首页越来越慢，访问量却越来越大，这一阶段已经开始。

Therefore I would like to cache these streams, I would like only my 'robot' to directly download streams from source and all visitors to use my local copy to make things much quicker and decrease traffic load massively. 因此，我想缓存这些流，我只希望我的“机器人”直接从源和所有访问者那里下载流，以使用我的本地副本使事情变得更快，并大大减少流量。

I dont have a problem downloading XML locally and locally call files to show data. 我在本地下载XML和本地调用文件以显示数据时没有问题。 But I would like to know how to solve possible issues like: 但我想知道如何解决可能的问题，例如：

not showing data to clients because robot is updating XML files and original file would be overwritten and empty whilst loading new data 未显示数据给客户端，因为机器人正在更新XML文件，并且在加载新数据时原始文件将被覆盖并且为空
using XML file as local backup, means that if source server is offline homepage would be still working and loading 使用XML文件作为本地备份，意味着如果源服务器处于脱机状态，主页将仍在工作并正在加载
making sure that I wont lock data for clients in such way that robot would be unable to update files 确保我不会为客户端锁定数据，以免机器人无法更新文件

My first toughts would be to work with 2 xml files for every stream, one which would be shown to clients and one which would be downloaded. 我的第一个任务是为每个流使用2个xml文件，其中一个将显示给客户端，另一个将被下载。 If download is correct then downloaded XML would be used as live data and other one deleted. 如果下载正确，则将已下载的XML用作实时数据，并删除其他XML。 Some kind of incremental marking with one file working as file holding name of actual data. 一种增量标记，其中一个文件用作保存实际数据名称的文件。

Is there any way how to cache these XML files so it would do something similar? 有什么办法可以缓存这些XML文件，以便执行类似的操作？ Really the main issue is to have bulletproof solution so clients wont see error pages or empty results. 真正的主要问题是拥有防弹解决方案，以便客户不会看到错误页面或空结果。

Thanks. 谢谢。

2 个解决方案

Use the caching options built into HttpWebResponse. 使用HttpWebResponse中内置的缓存选项。 This lets you programatically choose between obtaining straight from cache (ignoring freshness), ignoring the cache, forcing the cache to be refreshed, forcing the cache to be revalidated and the normal behaviour of using the cache if it's considered fresh according to the original response's age information, and otherwise revalidating it. 这样，您就可以通过编程方式在从缓存直接获取（忽略新鲜度），忽略缓存，强制刷新缓存，强制重新验证缓存以及根据原始响应的使用期限认为是新鲜的正常使用缓存之间进行选择。信息，然后重新进行验证。

Even if you've really specific caching requirements that need to go beyond that, build it on top of doing HTTP caching properly, rather than as a complete replacement. 即使您确实有特定的缓存需求需要超越此限制，也要在正确进行HTTP缓存的基础上构建它，而不是完全替代它。

If you do need to manage your own cache of the XML streams, then normal file locking and if really necessary, .NET ReaderWriterLockSlims should suffice to keep different threads from messing each other up. 如果确实需要管理自己的XML流缓存，则可以使用正常的文件锁定，如果确实需要，.NET ReaderWriterLockSlims也足以防止不同的线程彼此混乱。 One possibility to remove the risk of contention that is too high, is to default to direct access in the case of cache contention. 消除争用风险过高的一种可能性是在缓存争用的情况下默认直接访问。 Consider that caching is ultimately an optimisation (conceptually you are getting the file "from the server", caching just makes this happen in a more efficient manner). 考虑到缓存最终是一种优化（从概念上讲，您是从“服务器”获取文件的，缓存只是使缓存以更有效的方式发生）。 Hence, if you fail to quickly obtain a read-lock, you can revert to downloading directly. 因此，如果您无法快速获得读锁，则可以还原为直接下载。 This in turn reduces the wait that can happen for the write lock (because pending locks won't stack up over time while a write lock is requested). 反过来，这减少了写锁定可能发生的等待时间（因为在请求写锁定时未决的锁定不会随着时间的推移而堆积）。 In practice it probably won't happen very often, but it will save you from the risk of unacceptable contention building up around one file and bringing the whole system down. 实际上，它可能不会经常发生，但是它将使您免于围绕一个文件建立不可接受的争用并导致整个系统崩溃的风险。

I'm going to start by assuming that you don't own code that produces the source XML feed? 我将从假设您不拥有产生源XML提要的代码开始？ Because if you do, I'd look at adding some specific support for the queries you want to run. 因为如果这样做，我希望为您要运行的查询添加一些特定的支持。

I had a similar issue with a third-party feed and built a job that runs a few times a day, downloads the feed, parses it, and stores the results locally in a database. 我在第三方Feed中遇到了类似的问题，并建立了每天运行几次的工作，下载该Feed，对其进行解析，然后将结果本地存储在数据库中。

You need to do a bit of comparison each time you update the database, and only add new records and delete old records, but it ensures that you always have data to feed to your clients and the database works around simple issues like file locking. 每次更新数据库时，您都需要做一些比较，只添加新记录并删除旧记录，但这可以确保您始终有数据要馈送到客户端，并且数据库可以解决诸如文件锁定之类的简单问题。

Then I'd look at a simple service layer to expose the data in your local store. 然后，我将看一个简单的服务层来公开本地存储中的数据。