简体   繁体   English

合并查询中的.csv文件以通过UDF和ADODB连接从Excel检索其数据

[英]Merging .csv files in query to retrieve its data from Excel through a UDF and ADODB connection

My goal is to run some code in vba and call the function from an Excel cell to retrieve some data from a closed .csv file or .xlsx file. 我的目标是在vba运行一些代码,并从Excel单元vba用该函数,以从关闭的.csv文件或.xlsx文件中检索一些数据。

This could be done in several ways, but all I've tried have an important constraint. 这可以通过多种方式完成,但是我尝试过的所有方法都有一个重要的约束。

I start with a very large .csv file. 我从一个非常大的.csv文件开始。 Very large is around 4,000 rows and more than 1,000 columns. 非常大,大约4,000行,超过1,000列。

First try: 第一次尝试:

Save the .csv in an Excel worksheet and use ExecuteExcel4Macro to retrieve the data. .csv保存在Excel工作表中,然后使用ExecuteExcel4Macro检索数据。 This works fine when running a Sub , and even when running a Function . 在运行Sub甚至运行Function时,此方法都可以正常工作。 But, unfortunately, you can't use ExecuteExcel4Macro and call it from an Excel cell. 但是,不幸的是,您不能使用ExecuteExcel4Macro并从Excel单元格中调用它。 The first try is done. 第一次尝试完成。

Second try 第二次尝试

Use an ADODB Connection and run a query directly from the .csv file or from the saved .xlsx file. 使用ADODB Connection并直接从.csv文件或已保存的.xlsx文件运行查询。 This can be used from a cell, but, surprise, surprise, it has a limit of 255 columns or fields. 可以从一个单元格中使用它,但是,令人惊讶的是,它有255个列或字段的限制。 I mean, when you run a query and try to read a field that is positioned in a column number greater than 255, the function does nothing. 我的意思是,当您运行查询并尝试读取位于大于255的列号中的字段时,该功能不执行任何操作。 Second try done. 第二次尝试完成。

Third (and last (by now)) try. 第三次(也是最后一次(到目前为止))尝试。 Need your help here! 在这里需要您的帮助!

Ok, I could divide the original table, which has too many fields into several tables containing a maximum of 255 fields each. 好的,我可以将原始表(该表具有太多字段)划分为几个表,每个表最多包含255个字段。

Note: the first column contains the ids of firms, banks or whatever. 注意:第一列包含公司,银行或任何其他机构的ID。 The rest of the fields are named x1, x2,...x1050 and they correspond to fields of financial statements, so they are all numeric and they are all useful for the analysis. 其余字段的名称分别为x1,x2,... x1050,它们与财务报表的字段相对应,因此它们都是数字形式,对分析很有用。

If I split the big table in different ones, the aspect would be like: 如果将大表拆分为不同的表,则方面将如下所示:

Table 1:
Name     x1     x2     x5    x6    x15...
myName1  15025  1546   6546  548   98663
myName2  867486 4684   68786 876   68997
myName3  87663  43397  87987 457   -4554
etc.

...

Table n:
Name     x928     x929     x940    x1005    x1250
myName1  765454   541546   76546   74548    18663
myName2  6564     544684   686     41876    58997
myName3  4687     64397    9887    879457   8554

I can do this by running some vba before I store the files, so now I have n .csv files. 我可以通过在存储文件之前运行一些vba来做到这一点,所以现在我有了n .csv文件。 The point is that I want the formula called from a cell like this: 关键是我希望从像这样的单元格中调用公式:

=GetData(path,file,name,operations)

I mean, the user wants to locate a name in a file and make some operations with "all" the fiels available, from 1 to 1250. 我的意思是,用户希望在file找到一个name ,并使用“全部”可用字段(从1到1250)进行一些operations

Let's suposse the first splitted table goes from x1 field to x250 field. 让我们假设第一个拆分表从x1字段到x250字段。 The second would go from x251 to x500, etc. All of the tables would have a first column with the names field, of course, and all tables would have the same number of rows (not the same number of columns, as not all x's fields exist). 第二个从x251到x500,依此类推。当然,所有表的第一列都包含names字段,并且所有表的行数都相同(不是相同的列数,因为不是所有x都一样)字段存在)。

But, important, the operations called by user could be like: 但是,重要的是,该operations被用户调用可能是这样的:

"x3"                      --> User requests only one field.
"x5+x150"                 --> User requests the sum of two fields that would be in the same table (as the x150 field is not greater than x250 field)
"x452+x535-x900+x1200-x1" --> User requests operations with many fields that would be kept in different files. 

When the user requests only a field, I can write a small routine in the beggining of the function to tell the function in which .csv file is that field stored, like: 当用户只请求一个字段时,我可以在函数的开头编写一个小的例程,以告诉该函数该字段存储在哪个.csv文件中,例如:

if singleField<=250 then 
  fileToLookAt="SplittedCSV_1"
end if 
if singleField>250 and singleField<=500 then 
  fileToLookAt="SplittedCSV_2"
end if 

Then, using an ADODB Connection and Microsoft.Jet.OLEDB.4.0 provider, I would run the query like: 然后,使用ADODB ConnectionMicrosoft.Jet.OLEDB.4.0提供程序,我将运行查询,如下所示:

MyQuery = "SELECT x" & singleField & " AS MyData FROM [" & fileToLookAt & ".csv] WHERE Name='" & name & "'"

But, what happens when the user wants an operation involving x's fields stored in all different files, like the third example I gave? 但是,当用户想要一个涉及存储在所有不同文件中的x字段的操作时会发生什么,就像我给出的第三个示例一样? I would have to "merge" all tables, using Name field as the key for merging then. 我将必须“合并”所有表,然后使用“ Name字段作为合并的键。

How would you proceed? 您将如何进行? Is it merging the tables in the Select the best option? 是否将“ Select最佳”选项中的表合并? How would the Select be? Select会如何?

I mean the Query would be like: 我的意思是查询将是这样的:

MyQuery = "SELECT x452+x535-x900+x1200-x1 AS MyData FROM [" & MergedTable & ".csv] WHERE Name='" & name & "'"

Thanks a lot for your time. 非常感谢您的宝贵时间。

You could stuff the data into a mdb file using ado and bypass the 256 column limitation. 您可以使用ado将数据填充到mdb文件中,并绕过256列限制。 However, using UDF's to retrieve data directly from any external datasource is going to be very slow if you have more than a few. 但是,如果您使用的UDF数量过多,则使用UDF直接从任何外部数据源检索数据的速度将非常慢。 I would create a class to hold the data, with a load method called on opening the spreadsheet, and have your functions query the object. 我将创建一个类来保存数据,并使用打开电子表格时调用的load方法,并让您的函数查询该对象。 So your load method takes your csv as a datastream and fills a disconected ado recordset defined as a static variable, and then you define a getdata method that returns your desired value based on the parameters passed to it. 因此,您的load方法将csv作为数据流,并填充定义为静态变量的不连续的ado记录集,然后定义一个getdata方法,该方法根据传递给它的参数返回所需的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM