简体   繁体   English

mongoimport的错误处理

[英]Error Handling on mongoimport

I have a directory of roughly 45,000 json files. 我有大约45,000个json文件的目录。 The total size is around 12.8 GB currently. 当前总大小约为12.8 GB。 This is website data from kissmetrics and its structure is detailed here . 这是来自kissmetrics的网站数据,其结构在此处详细介绍

The data: Each file is multiple json documents separated by a newline It will be updated every 12 hours with new additional files 数据:每个文件都是用换行符分隔的多个json文档,它将每12小时更新一次,并包含新的其他文件

I want to import this data to mongoDB using mongoimport. 我想使用mongoimport将数据导入到mongoDB中。 I've tried this shell script to make the process easier: 我已经尝试过以下shell脚本来简化此过程:

for filename in revisions/*;

do

echo $filename
mongoimport --host <HOSTNAME>:<PORT> --db <DBNAME> --collection <COLLECTIONNAME> \
    --ssl --sslCAFile ~/mongodb.pem --username <USERNAME> --password <PASSWORD> \
    --authenticationDatabase admin $filename

done

This will have errors 这会有错误

2016-06-18T00:31:10.781+0000    using 1 decoding workers
2016-06-18T00:31:10.781+0000    using 1 insert workers
2016-06-18T00:31:10.781+0000    filesize: 113 bytes
2016-06-18T00:31:10.781+0000    using fields:
2016-06-18T00:31:10.822+0000    connected to: <HOSTNAME>:<PORT>
2016-06-18T00:31:10.822+0000    ns: <DBNAME>.<COLLECTION>
2016-06-18T00:31:10.822+0000    connected to node type: standalone
2016-06-18T00:31:10.822+0000    standalone server: setting write concern w to 1
2016-06-18T00:31:10.822+0000    using write concern: w='1', j=false, fsync=false, wtimeout=0
2016-06-18T00:31:10.822+0000    standalone server: setting write concern w to 1
2016-06-18T00:31:10.822+0000    using write concern: w='1', j=false, fsync=false, wtimeout=0
2016-06-18T00:31:10.824+0000    Failed: error processing document #1: invalid character 'l' looking for beginning of value
2016-06-18T00:31:10.824+0000    imported 0 documents

I will potentially run into this error, and from my inspection is not due to malformed data. 我可能会遇到此错误,从我的检查来看,这并不是由于数据格式错误。

The error may happen hours into the import. 该错误可能会在导入后数小时内发生。

Can I parse the error in mongoimport to retry the same document? 我可以解析mongoimport中的错误以重试同一文档吗? I don't know if the error will have this same form, so I'm not sure if I can try to handle it in bash. 我不知道该错误是否具有相同的形式,所以我不确定是否可以尝试使用bash进行处理。 Can I keep track of progress in bash and restart if terminated early? 我可以跟踪bash的进度并在提早终止的情况下重新启动吗? Any suggestions on importing large data of this size or handling the error in shell? 关于导入这种大小的大数据或处理Shell中的错误有什么建议吗?

Typically a given command will return error codes when it fails (and the are hopefully documented on the man page for the command). 通常,给定命令执行失败时将返回错误代码(希望在man页中有记录)。

So if you want to do something hacky and just retry once, 因此,如果您想进行一些骇人听闻的操作,然后重试一次,

cmd="mongoimport --foo --bar..."
$cmd
ret=$?
if [ $ret -ne 0 ]; then
  echo "retrying..."
  $cmd
  if [ $? -ne 0 ]; then
    "failed again.  Sadness."
    exit
  fi
fi

Or if you really need what mongoimport outputs, capture it like this 或者,如果您确实需要mongoimport输出,请像这样捕获它

results=`mongoimport --foo --bar...`

Now the variable $results will contain what was returned on stdout . 现在变量$results将包含在stdout返回的内容。 Might have to redirect stderr as well. 可能还必须重定向stderr

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM