My goal is to run some code in vba
and call the function from an Excel cell to retrieve some data from a closed .csv
file or .xlsx
file.
This could be done in several ways, but all I've tried have an important constraint.
I start with a very large .csv
file. Very large is around 4,000 rows and more than 1,000 columns.
First try:
Save the .csv
in an Excel worksheet and use ExecuteExcel4Macro
to retrieve the data. This works fine when running a Sub
, and even when running a Function
. But, unfortunately, you can't use ExecuteExcel4Macro
and call it from an Excel cell. The first try is done.
Second try
Use an ADODB Connection
and run a query directly from the .csv
file or from the saved .xlsx
file. This can be used from a cell, but, surprise, surprise, it has a limit of 255 columns or fields. I mean, when you run a query and try to read a field that is positioned in a column number greater than 255, the function does nothing. Second try done.
Third (and last (by now)) try. Need your help here!
Ok, I could divide the original table, which has too many fields into several tables containing a maximum of 255 fields each.
Note: the first column contains the ids of firms, banks or whatever. The rest of the fields are named x1, x2,...x1050 and they correspond to fields of financial statements, so they are all numeric and they are all useful for the analysis.
If I split the big table in different ones, the aspect would be like:
Table 1:
Name x1 x2 x5 x6 x15...
myName1 15025 1546 6546 548 98663
myName2 867486 4684 68786 876 68997
myName3 87663 43397 87987 457 -4554
etc.
...
Table n:
Name x928 x929 x940 x1005 x1250
myName1 765454 541546 76546 74548 18663
myName2 6564 544684 686 41876 58997
myName3 4687 64397 9887 879457 8554
I can do this by running some vba before I store the files, so now I have n
.csv
files. The point is that I want the formula called from a cell like this:
=GetData(path,file,name,operations)
I mean, the user wants to locate a name
in a file
and make some operations
with "all" the fiels available, from 1 to 1250.
Let's suposse the first splitted table goes from x1 field to x250 field. The second would go from x251 to x500, etc. All of the tables would have a first column with the names
field, of course, and all tables would have the same number of rows (not the same number of columns, as not all x's fields exist).
But, important, the operations
called by user could be like:
"x3" --> User requests only one field.
"x5+x150" --> User requests the sum of two fields that would be in the same table (as the x150 field is not greater than x250 field)
"x452+x535-x900+x1200-x1" --> User requests operations with many fields that would be kept in different files.
When the user requests only a field, I can write a small routine in the beggining of the function to tell the function in which .csv
file is that field stored, like:
if singleField<=250 then
fileToLookAt="SplittedCSV_1"
end if
if singleField>250 and singleField<=500 then
fileToLookAt="SplittedCSV_2"
end if
Then, using an ADODB Connection
and Microsoft.Jet.OLEDB.4.0
provider, I would run the query like:
MyQuery = "SELECT x" & singleField & " AS MyData FROM [" & fileToLookAt & ".csv] WHERE Name='" & name & "'"
But, what happens when the user wants an operation involving x's fields stored in all different files, like the third example I gave? I would have to "merge" all tables, using Name
field as the key for merging then.
How would you proceed? Is it merging the tables in the Select
the best option? How would the Select
be?
I mean the Query would be like:
MyQuery = "SELECT x452+x535-x900+x1200-x1 AS MyData FROM [" & MergedTable & ".csv] WHERE Name='" & name & "'"
Thanks a lot for your time.
You could stuff the data into a mdb file using ado and bypass the 256 column limitation. However, using UDF's to retrieve data directly from any external datasource is going to be very slow if you have more than a few. I would create a class to hold the data, with a load method called on opening the spreadsheet, and have your functions query the object. So your load method takes your csv as a datastream and fills a disconected ado recordset defined as a static variable, and then you define a getdata method that returns your desired value based on the parameters passed to it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.