简体   繁体   English

javascript:如何最有效地管理两个文件之间的数据导入

[英]javascript : how to most efficiently manage data import across two files

I have two API endpoints that I am polling via a node.js / coffeescript script: an /addresses endpoint that returns a list of home addresses in a given city and a /homevalue endpoint that returns the value of a home at a given address. 我有两个通过node.js / coffeescript脚本进行轮询的API端点:一个/addresses端点,它返回给定城市中的家庭住址列表;和一个/homevalue端点,它返回一个给定地址中的住所值。

I am polling each endpoint in series for a given city, let's say Buffalo. 我正在调查给定城市的系列每个端点,比方说布法罗。 For auditing purposes, I am saving the content of each in local directories, at .../addresses/addresses.txt and .../homeValues/homeValues.txt . 出于审计目的,我将每个目录的内容保存在本地目录中,位于.../addresses/addresses.txt.../homeValues/homeValues.txt The script runs though all of the homes in a city, then saves these to the addresses directory, then polls the /homevalue endpoint and saves the results in a text file in the homeValues directory. 该脚本在城市中的所有房屋中运行,然后将其保存到地址目录,然后轮询/homevalue端点,并将结果保存在homeValues目录中的文本文件中。

I then do some transformative work to convert both addresses and home values into a canonicalized format, saving each of these into a separate directory, .../canonicalAddresses and .../canonicalHomeValues . 然后,我进行一些转换性工作,将地址和原始值都转换为规范化格式,并将它们分别保存到单独的目录.../canonicalAddresses.../canonicalHomeValues I then merge the canonical addresses and home values into a text file at .../unifiedAddresses/unifiedAddresses.txt 然后,我将规范地址和原始值合并到文本文件中,该文件位于.../unifiedAddresses/unifiedAddresses.txt

I cannot save these files as JSON, I have to save them in a text fileas a series of json objects, one per line. 我无法将这些文件另存为JSON,我必须将它们保存为文本文件中的一系列json对象,每行一个。 I am also doing this synchronously rather than async because I want to maintain an audit trail. 我也正在同步执行此操作,而不是异步执行此操作,因为我想维护审核跟踪。

The canonicalized address file is a series of lines like: 规范化的地址文件包含以下几行:

{id: 12345, address: {...}}
{id: XYZAB, address: {...}}

The home values list is historical by year and is a series of lines like: 房屋价值清单是按年份列出的历史记录,并且由以下几行组成:

[{id: 12345, homevalue: {year: 1990,...}, {id: 12345, homevalue: {year: 1991,...}}...]
[{id: XYZAB, homevalue: {year: 1990,...}, {id: 12346, homevalue: {year: 1991,...}}...]

This is my greatly simplified pseudocode for that merge, which requires that I read both .../addresses/addresses.txt and .../homeValues/homeValues.txt from disk: 这是我为合并而大大简化的伪代码,它要求我从磁盘读取.../addresses/addresses.txt.../homeValues/homeValues.txt

canonicalizedHomeValuesFile = "..."
canonicalizedAddressesFile = "..."
unifiedAddressFile = "..."

getHomeValue = (addressID) ->
   fs.readFileSync(canonicalizedHomeValuesFile).toString().split('\n').forEach((homevalue)=>
       << return the canonicalized home value if homevalue.ID is addressID >>
   )

fs.readFileSync(canonicalizedAddressFile).toString().split('\n').forEach((address)=>   
   address.value = getHomeValue(address.ID)
   fs.appendFileSync(unifiedAddressFile, JSON.stringify(address) + "\n")
) 

This approach works fine for small numbers of houses but is insanely slow to unify large numbers of addresses. 这种方法适用于少量房屋,但是统一大量地址的速度很慢。 For about 2000 houses, this approach takes upwords of 4 minutes per house. 对于大约2000所房屋,此方法每个房屋要花4分钟。

It seems to me the real bottleneck is the getHomeValue() function. 在我看来,真正的瓶颈是getHomeValue()函数。 What is a more efficient way to approach that lookup? 有什么更有效的方法来进行该查找?

If the data is large enough it might be worth preloading the objects and then matching them using a binary search. 如果数据足够大,则可能需要预加载对象,然后使用二进制搜索将其匹配。 It looks like you are loading the file from disk each time you get the home value. 好像您每次获得原始值时都从磁盘加载文件。 It also looks like you are writing to disk with each iteration in readFileSync. 看起来您在readFileSync中的每次迭代都在写磁盘。 If that is the case you might consider writing the file after all transactions are complete. 如果是这种情况,您可以考虑在所有事务完成之后写入文件。 I would minimize drive access as much as possible by batching the load and save. 我将通过分批加载和保存来最大程度地减少驱动器访问。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 跨两个php文件传递值并使用JavaScript加载数据 - Passing values across two php files and loading data using JavaScript 如何以与Node.js和浏览器JavaScript同时兼容的方式跨文件导入JavaScript函数 - How to import JavaScript functions across files in manner that is simultaneously compatible with Node.js and with browser JavaScript 如何有效地管理源代码和模糊JavaScript之间的源代码控制? - How to efficiently manage source control between source code and obfuscated javascript? 如何跨多个文件有效地切换具有多个 HTML 元素的类 - How to efficiently toggle classes with multiple HTML elements across multiple files 如何在javascript中的项目数组中最有效地重命名键? - How to rename key most efficiently in array of items in javascript? 如何在JavaScript中最有效地合并和排序带有日期时间的字典数组? - How to most efficiently merge and sort arrays of dicts with datetime in javascript? 如何在javascript中最有效地从对象数组生成字符串? - How to most efficiently generate string from array of objects in javascript? 如何在Javascript中有效地合并两个数组? - How to efficiently merge two arrays in Javascript? 使用 JavaScript 跨两个页面发送数据 - Send data across two pages using JavaScript 如何最有效地搜索数据? (RAM或MongoDB中的对象) - How to search data most efficiently? (Object in RAM or MongoDB)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM