[英]Merging .csv files in query to retrieve its data from Excel through a UDF and ADODB connection
My goal is to run some code in vba
and call the function from an Excel cell to retrieve some data from a closed .csv
file or .xlsx
file. 我的目标是在
vba
运行一些代码,并从Excel单元vba
用该函数,以从关闭的.csv
文件或.xlsx
文件中检索一些数据。
This could be done in several ways, but all I've tried have an important constraint. 这可以通过多种方式完成,但是我尝试过的所有方法都有一个重要的约束。
I start with a very large .csv
file. 我从一个非常大的
.csv
文件开始。 Very large is around 4,000 rows and more than 1,000 columns. 非常大,大约4,000行,超过1,000列。
First try: 第一次尝试:
Save the .csv
in an Excel worksheet and use ExecuteExcel4Macro
to retrieve the data. 将
.csv
保存在Excel工作表中,然后使用ExecuteExcel4Macro
检索数据。 This works fine when running a Sub
, and even when running a Function
. 在运行
Sub
甚至运行Function
时,此方法都可以正常工作。 But, unfortunately, you can't use ExecuteExcel4Macro
and call it from an Excel cell. 但是,不幸的是,您不能使用
ExecuteExcel4Macro
并从Excel单元格中调用它。 The first try is done. 第一次尝试完成。
Second try 第二次尝试
Use an ADODB Connection
and run a query directly from the .csv
file or from the saved .xlsx
file. 使用
ADODB Connection
并直接从.csv
文件或已保存的.xlsx
文件运行查询。 This can be used from a cell, but, surprise, surprise, it has a limit of 255 columns or fields. 可以从一个单元格中使用它,但是,令人惊讶的是,它有255个列或字段的限制。 I mean, when you run a query and try to read a field that is positioned in a column number greater than 255, the function does nothing.
我的意思是,当您运行查询并尝试读取位于大于255的列号中的字段时,该功能不执行任何操作。 Second try done.
第二次尝试完成。
Third (and last (by now)) try. 第三次(也是最后一次(到目前为止))尝试。 Need your help here!
在这里需要您的帮助!
Ok, I could divide the original table, which has too many fields into several tables containing a maximum of 255 fields each. 好的,我可以将原始表(该表具有太多字段)划分为几个表,每个表最多包含255个字段。
Note: the first column contains the ids of firms, banks or whatever. 注意:第一列包含公司,银行或任何其他机构的ID。 The rest of the fields are named x1, x2,...x1050 and they correspond to fields of financial statements, so they are all numeric and they are all useful for the analysis.
其余字段的名称分别为x1,x2,... x1050,它们与财务报表的字段相对应,因此它们都是数字形式,对分析很有用。
If I split the big table in different ones, the aspect would be like: 如果将大表拆分为不同的表,则方面将如下所示:
Table 1:
Name x1 x2 x5 x6 x15...
myName1 15025 1546 6546 548 98663
myName2 867486 4684 68786 876 68997
myName3 87663 43397 87987 457 -4554
etc.
...
Table n:
Name x928 x929 x940 x1005 x1250
myName1 765454 541546 76546 74548 18663
myName2 6564 544684 686 41876 58997
myName3 4687 64397 9887 879457 8554
I can do this by running some vba before I store the files, so now I have n
.csv
files. 我可以通过在存储文件之前运行一些vba来做到这一点,所以现在我有了
n
.csv
文件。 The point is that I want the formula called from a cell like this: 关键是我希望从像这样的单元格中调用公式:
=GetData(path,file,name,operations)
I mean, the user wants to locate a name
in a file
and make some operations
with "all" the fiels available, from 1 to 1250. 我的意思是,用户希望在
file
找到一个name
,并使用“全部”可用字段(从1到1250)进行一些operations
。
Let's suposse the first splitted table goes from x1 field to x250 field. 让我们假设第一个拆分表从x1字段到x250字段。 The second would go from x251 to x500, etc. All of the tables would have a first column with the
names
field, of course, and all tables would have the same number of rows (not the same number of columns, as not all x's fields exist). 第二个从x251到x500,依此类推。当然,所有表的第一列都包含
names
字段,并且所有表的行数都相同(不是相同的列数,因为不是所有x都一样)字段存在)。
But, important, the operations
called by user could be like: 但是,重要的是,该
operations
被用户调用可能是这样的:
"x3" --> User requests only one field.
"x5+x150" --> User requests the sum of two fields that would be in the same table (as the x150 field is not greater than x250 field)
"x452+x535-x900+x1200-x1" --> User requests operations with many fields that would be kept in different files.
When the user requests only a field, I can write a small routine in the beggining of the function to tell the function in which .csv
file is that field stored, like: 当用户只请求一个字段时,我可以在函数的开头编写一个小的例程,以告诉该函数该字段存储在哪个
.csv
文件中,例如:
if singleField<=250 then
fileToLookAt="SplittedCSV_1"
end if
if singleField>250 and singleField<=500 then
fileToLookAt="SplittedCSV_2"
end if
Then, using an ADODB Connection
and Microsoft.Jet.OLEDB.4.0
provider, I would run the query like: 然后,使用
ADODB Connection
和Microsoft.Jet.OLEDB.4.0
提供程序,我将运行查询,如下所示:
MyQuery = "SELECT x" & singleField & " AS MyData FROM [" & fileToLookAt & ".csv] WHERE Name='" & name & "'"
But, what happens when the user wants an operation involving x's fields stored in all different files, like the third example I gave? 但是,当用户想要一个涉及存储在所有不同文件中的x字段的操作时会发生什么,就像我给出的第三个示例一样? I would have to "merge" all tables, using
Name
field as the key for merging then. 我将必须“合并”所有表,然后使用“
Name
字段作为合并的键。
How would you proceed? 您将如何进行? Is it merging the tables in the
Select
the best option? 是否将“
Select
最佳”选项中的表合并? How would the Select
be? Select
会如何?
I mean the Query would be like: 我的意思是查询将是这样的:
MyQuery = "SELECT x452+x535-x900+x1200-x1 AS MyData FROM [" & MergedTable & ".csv] WHERE Name='" & name & "'"
Thanks a lot for your time. 非常感谢您的宝贵时间。
You could stuff the data into a mdb file using ado and bypass the 256 column limitation. 您可以使用ado将数据填充到mdb文件中,并绕过256列限制。 However, using UDF's to retrieve data directly from any external datasource is going to be very slow if you have more than a few.
但是,如果您使用的UDF数量过多,则使用UDF直接从任何外部数据源检索数据的速度将非常慢。 I would create a class to hold the data, with a load method called on opening the spreadsheet, and have your functions query the object.
我将创建一个类来保存数据,并使用打开电子表格时调用的load方法,并让您的函数查询该对象。 So your load method takes your csv as a datastream and fills a disconected ado recordset defined as a static variable, and then you define a getdata method that returns your desired value based on the parameters passed to it.
因此,您的load方法将csv作为数据流,并填充定义为静态变量的不连续的ado记录集,然后定义一个getdata方法,该方法根据传递给它的参数返回所需的值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.