简体   繁体   中英

How to import CSV into Sqlite in R where one of the variables has a comma (,) within quotes?

This is driving me mad.

I have a csv file "hello.csv"

a,b
"drivingme,mad",1

I just want to convert this into a sqlite database from within R (I need to do this because the actual file is actually 10G and it won't fit into a data.frame, so I will use Sqlite as an intermediate datastore)

dbWriteTable(conn= dbConnect(SQLite(), 
            dbname="c:/temp/data.sqlite3", 
             name="data", 
             value="c:/temp/hello.csv",row.names=FALSE, header=TRUE)

The above code failed with error

Error in try({ : 
  RS-DBI driver: (RS_sqlite_import: c:/temp/hello.csv line 2 expected 2 columns of data but found 3)
In addition: Warning message:
In read.table(fn, sep = sep, header = header, skip = skip, nrows = nrows,  :
  incomplete final line found by readTableHeader on 'c:/temp/hello.csv'

How do I tell it to treat comma (,) within a quote "" is to be treat as string and not a separator!

I tried adding in the argument

quote="\""

But it didn't work. Help!! read.csv work just file it will fail when reading large files.

Update

A much better now is to use readr 's chunked functions eg

#setting up sqlite
con_data = dbConnect(SQLite(), dbname="yoursqlitefile")

readr::read_delim_chunked(file, function(chunk) {
  dbWriteTable(con_data, chunk, name="data", append=TRUE )) #write to sqlite 
})

Original more cumbuersome way

One way to do this is to read from the file since read.csv works but it just cannot load the whole data into memory.

    n = 100000 # experiment with this number
    f = file(csv) 
    con = open(f) # open a connection to the file
    data <-read.csv(f,nrows=n,header=TRUE)
    var.names = names(data)    
 
    #setting up sqlite
    con_data = dbConnect(SQLite(), dbname="yoursqlitefile")
  
    while(nrow(data) == n) { # if not reached the end of line
      dbWriteTable(con_data, data, name="data",append=TRUE )) #write to sqlite 
      data <-read.csv(f,nrows=n,header=FALSE))
      names(data) <- var.names      
    } 
    close(f)
    if (nrow(data) != 0 ) {      
      dbWriteTable(con_data, data, name="data",append=TRUE ))

Improving the proposed answer:

data_full_path <- paste0(data_folder, data_file)
con_data <- dbConnect(SQLite(),
  dbname=":memory:") # you can also store in a .sqlite file if you prefer

readr::read_delim_chunked(file =  data_full_path,
                          callback =function(chunk,
                                             dummyVar # https://stackoverflow.com/a/42826461/9071968
                                             ) {
                            dbWriteTable(con_data, chunk, name="data", append=TRUE ) #write to sqlite 
                            },
  delim = ";",
  quote = "\""
)

(The other, current answer with readr does not work: parentheses are not balanced, the chunk function requires two parameters, see https://stackoverflow.com/a/42826461/9071968 )

You make a parser to parse it.

string = yourline[i];
if (string.equals(",")) string = "%40";
yourline[i] = string;

or something of that nature. You could also use:

string.split(",");

and rebuild your string that way. That's how I would do it.

Keep in mind that you'll have to "de-parse" it when you want to get the values back. Commas in SQL mean column, so it can really screw things up, not to mention JSONArrays or JSONObjects.

Also keep in mind that this might be very costly for 10GB of data, so you might want to start by parsing the input before it even gets to the CSV if possible..

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM