简体   繁体   中英

How to pull column names from multiple tables using R

Sorry in advance due to being new to Rstudio...

There are two parts to this question:

1) I have a large database that has almost 6,000 tables in it. Many of these tables have no data in them. Is there a code using R to only pull a list of tables names that have data in them?

I know how to pull a list of all table names and how to pull specific table data using the code below..

test<-odbcDriverConnect('driver={SQL Server};server=(SERVER);database=(DB_Name);trusted_connection=true')
rest<-sqlQuery(test,'select*from information_schema.tables')
Table1<-sqlFetch(test, "PROPERTY")

Above is the code I use to access the database and tables.

  • "test" is the connection
  • "rest" shows the list of 5,803 tables names.. one of which is called "PROPERTY"
  • "Table1" is simply pulling one of the tables named "PROPERTY".

I am looking to make "rest" only show the data tables that have data in them.

2) My ultimate goal, which leads to the second question, is to create a table that shows a list of every table from this database in column#1 and then column 2,3,4,etc... would include every one of the column headers that is contained in each table. Any idea how do to that?

Thanks so much!

The Tables object below returns a data frame giving all of the tables in the database and how many rows are in each table. As a condition, it requires that any table selected have at least one record. This is probably the fastest way to get your list of non-empty tables. I pulled the query to get that information from https://stackoverflow.com/a/14163881/1017276

My only reservation about that query is that it doesn't give the schema name, and it is possible to have tables with the same name in different schemas. So this is likely only going to work well within one schema at a time.

library(RODBCext)

Tables <- 
  sqlExecute(
    channel = test,
    query = "SELECT T.name TableName, I.rows Records
             FROM sysobjects t, sysindexes i
             WHERE T.xtype = ? AND I.id = T.id AND I.indid IN (0,1) AND I.rows > 0
             ORDER BY TableName;",
    data = list(xtype = "U"),
    fetch = TRUE,
    stringsAsFactors = FALSE
  )

This next part uses the tables you found above and then gets the column information from each of those tables. Lastly, it makes on single data frame with all of the column names.

Columns <- 
  lapply(Tables$TableName,
         function(x) sqlColumns(test, x))
Columns <- do.call("rbind", Columns)

sqlColumns is a function in RODBC .

sqlExecute is a function in RODBCext that allows for parameterized queries. I tend to use that anytime I need to use quoted strings in a query.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM