简体   繁体   English

无法批量导入自由流文本 MonetDB.R

[英]Unable to Bulk Import Free flow text MonetDB.R

I am trying to import a dataset of 217,000 records (Jeopardy Dataset) into MonetDB through the MonetDB.R interface.我正在尝试通过 MonetDB.R 接口将包含 217,000 条记录(危险数据集)的数据集导入 MonetDB。

The file is a CSV file with top two lines as folows:该文件是一个 CSV 文件,前两行如下:

show_nos, air_dt, rnd, category, prize, ques, ans,x1,x2,x3
4680,12/31/2004,Jeopardy!,THE COMPANY LINE,$200 ,"In 1963, live on ""The Art Linkletter Show"", this company served its billionth burger",McDonald's,,,

4680,12/31/2004,Jeopardy!,EPITAPHS & TRIBUTES,$200 ,"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States",John Adams,,,

The problem I face is while importing the ques column (data between " ").我面临的问题是导入ques列(“”之间的数据)。 That column has multiple commas and punctuations, and monet.read.csv is unable to import that column.该列有多个逗号和标点符号,并且 monet.read.csv 无法导入该列。

I tried importing a few records without the ques column, and it works perfectly.我尝试在没有ques列的情况下导入一些记录,并且效果很好。

Can you please suggest on how to import such columns with free flow text in monetdb?您能否建议如何在 monetdb 中使用自由流文本导入此类列? Once imported I intend to perform some text analysis on the column.导入后,我打算对该列执行一些文本分析。

use monet.read.csv使用monet.read.csv

i also prefer MonetDBLite for easier setup but monet.read.csv does work with just MonetDB.R thanks我也更喜欢MonetDBLite以便于设置,但monet.read.csv只适用于MonetDB.R谢谢

mylines <-
    c("show_nos, air_dt, rnd, category, prize, ques, ans,x1,x2,x3", 
    "4680,12/31/2004,Jeopardy!,THE COMPANY LINE,$200 ,\"In 1963, live on \"\"The Art Linkletter Show\"\", this company served its billionth burger\",McDonald's,,,", 
    "4680,12/31/2004,Jeopardy!,EPITAPHS & TRIBUTES,$200 ,\"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States\",John Adams,,,")

tf <- tempfile()
dbfolder <- tempdir()

writeLines( mylines , tf )

library(MonetDBLite)
library(MonetDB.R)

db <- dbConnect( MonetDBLite() , dbfolder )

monet.read.csv( db , tf , 'mytable' )

# looks ok to me
dbReadTable( db , 'mytable' )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM