简体   繁体   中英

Merging .csv files in query to retrieve its data from Excel through a UDF and ADODB connection

My goal is to run some code in vba and call the function from an Excel cell to retrieve some data from a closed .csv file or .xlsx file.

This could be done in several ways, but all I've tried have an important constraint.

I start with a very large .csv file. Very large is around 4,000 rows and more than 1,000 columns.

First try:

Save the .csv in an Excel worksheet and use ExecuteExcel4Macro to retrieve the data. This works fine when running a Sub , and even when running a Function . But, unfortunately, you can't use ExecuteExcel4Macro and call it from an Excel cell. The first try is done.

Second try

Use an ADODB Connection and run a query directly from the .csv file or from the saved .xlsx file. This can be used from a cell, but, surprise, surprise, it has a limit of 255 columns or fields. I mean, when you run a query and try to read a field that is positioned in a column number greater than 255, the function does nothing. Second try done.

Third (and last (by now)) try. Need your help here!

Ok, I could divide the original table, which has too many fields into several tables containing a maximum of 255 fields each.

Note: the first column contains the ids of firms, banks or whatever. The rest of the fields are named x1, x2,...x1050 and they correspond to fields of financial statements, so they are all numeric and they are all useful for the analysis.

If I split the big table in different ones, the aspect would be like:

Table 1:
Name     x1     x2     x5    x6    x15...
myName1  15025  1546   6546  548   98663
myName2  867486 4684   68786 876   68997
myName3  87663  43397  87987 457   -4554
etc.

...

Table n:
Name     x928     x929     x940    x1005    x1250
myName1  765454   541546   76546   74548    18663
myName2  6564     544684   686     41876    58997
myName3  4687     64397    9887    879457   8554

I can do this by running some vba before I store the files, so now I have n .csv files. The point is that I want the formula called from a cell like this:

=GetData(path,file,name,operations)

I mean, the user wants to locate a name in a file and make some operations with "all" the fiels available, from 1 to 1250.

Let's suposse the first splitted table goes from x1 field to x250 field. The second would go from x251 to x500, etc. All of the tables would have a first column with the names field, of course, and all tables would have the same number of rows (not the same number of columns, as not all x's fields exist).

But, important, the operations called by user could be like:

"x3"                      --> User requests only one field.
"x5+x150"                 --> User requests the sum of two fields that would be in the same table (as the x150 field is not greater than x250 field)
"x452+x535-x900+x1200-x1" --> User requests operations with many fields that would be kept in different files. 

When the user requests only a field, I can write a small routine in the beggining of the function to tell the function in which .csv file is that field stored, like:

if singleField<=250 then 
  fileToLookAt="SplittedCSV_1"
end if 
if singleField>250 and singleField<=500 then 
  fileToLookAt="SplittedCSV_2"
end if 

Then, using an ADODB Connection and Microsoft.Jet.OLEDB.4.0 provider, I would run the query like:

MyQuery = "SELECT x" & singleField & " AS MyData FROM [" & fileToLookAt & ".csv] WHERE Name='" & name & "'"

But, what happens when the user wants an operation involving x's fields stored in all different files, like the third example I gave? I would have to "merge" all tables, using Name field as the key for merging then.

How would you proceed? Is it merging the tables in the Select the best option? How would the Select be?

I mean the Query would be like:

MyQuery = "SELECT x452+x535-x900+x1200-x1 AS MyData FROM [" & MergedTable & ".csv] WHERE Name='" & name & "'"

Thanks a lot for your time.

You could stuff the data into a mdb file using ado and bypass the 256 column limitation. However, using UDF's to retrieve data directly from any external datasource is going to be very slow if you have more than a few. I would create a class to hold the data, with a load method called on opening the spreadsheet, and have your functions query the object. So your load method takes your csv as a datastream and fills a disconected ado recordset defined as a static variable, and then you define a getdata method that returns your desired value based on the parameters passed to it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM