[英]SLOW SPEED in using SAX Parser to parse XML data and save it to mysql localhost (JAVA)
我正在用JAVA為當前程序編寫該問題。
我必須解析一個大小為1.60 GB的大.rdf文件(XML格式),然后將已解析的數據插入mysql localhost服務器。
谷歌搜索之后,我決定在代碼中使用SAX解析器。 許多站點鼓勵使用SAX解析器而不是DOM解析器,他們說SAX解析器比DOM解析器快得多。
但是,當我執行使用SAX解析器的代碼時,發現程序執行得如此緩慢。 我實驗室的一位資深人士告訴我,文件I / O進程可能發生了速度較慢的問題。
在“ javax.xml.parsers.SAXParser.class”代碼中,“ InputStream”用於文件輸入,與使用“ Scanner”類或“ BufferedReader”類相比,這可能會使作業變慢。
我的問題是.. 1. SAX解析器是否適合解析大型xml文檔?
My program took 10 minutes to parse a 14MB sample file and insert data
to mysql localhost.
Actually, another senior in my lab who made a similar program
as mine but using DOM parser parses the 1.60GB xml file and saves data
in an hour.
這是我向stackoverflow提出的第一個問題,因此任何建議都將是感激和有益的。 感謝您的閱讀。
收到初步反饋后增加了部分內容,我應該上載我的代碼以闡明我的問題,對此我深表歉意。
package xml_parse;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
public class Readxml extends DefaultHandler {
Connection con = null;
String[] chunk; // to check /A/, /B/, /C/ kind of stuff.
public Readxml() throws SQLException {
// connect to local mysql database
con = DriverManager.getConnection("jdbc:mysql://localhost/lab_first",
"root", "2030kimm!");
}
public void getXml() {
try {
// obtain and configure a SAX based parser
SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
// obtain object for SAX parser
SAXParser saxParser = saxParserFactory.newSAXParser();
// default handler for SAX handler class
// all three methods are written in handler's body
DefaultHandler default_handler = new DefaultHandler() {
String topic_gate = "close", category_id_gate = "close",
new_topic_id, new_catid, link_url;
java.sql.Statement st = con.createStatement();
public void startElement(String uri, String localName,
String qName, Attributes attributes)
throws SAXException {
if (qName.equals("Topic")) {
topic_gate = "open";
new_topic_id = attributes.getValue(0);
// apostrophe escape in SQL query
new_topic_id = new_topic_id.replace("'", "''");
if (new_topic_id.contains("International"))
topic_gate = "close";
if (new_topic_id.equals("") == false) {
chunk = new_topic_id.split("/");
for (int i = 0; i < chunk.length - 1; i++)
if (chunk[i].length() == 1) {
topic_gate = "close";
break;
}
}
if (new_topic_id.startsWith("Top/"))
new_topic_id.replace("Top/", "");
}
if (topic_gate.equals("open") && qName.equals("catid"))
category_id_gate = "open";
// add each new link to table "links" (MySQL)
if (topic_gate.equals("open") && qName.contains("link")) {
link_url = attributes.getValue(0);
link_url = link_url.replace("'", "''"); // take care of
// apostrophe
// escape
String insert_links_command = "insert into links(link_url, catid) values('"
+ link_url + "', " + new_catid + ");";
try {
st.executeUpdate(insert_links_command);
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
public void characters(char ch[], int start, int length)
throws SAXException {
if (category_id_gate.equals("open")) {
new_catid = new String(ch, start, length);
// add new row to table "Topics" (MySQL)
String insert_topics_command = "insert into topics(topic_id, catid) values('"
+ new_topic_id + "', " + new_catid + ");";
try {
st.executeUpdate(insert_topics_command);
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
public void endElement(String uri, String localName,
String qName) throws SAXException {
if (qName.equals("Topic"))
topic_gate = "close";
if (qName.equals("catid"))
category_id_gate = "close";
}
};
// BufferedInputStream!!
String filepath = null;
BufferedInputStream buffered_input = null;
/*
* // Content filepath =
* "C:/Users/Kim/Desktop/2016여름/content.rdf.u8/content.rdf.u8";
* buffered_input = new BufferedInputStream(new FileInputStream(
* filepath)); saxParser.parse(buffered_input, default_handler);
*
* // Adult filepath =
* "C:/Users/Kim/Desktop/2016여름/ad-content.rdf.u8"; buffered_input =
* new BufferedInputStream(new FileInputStream( filepath));
* saxParser.parse(buffered_input, default_handler);
*/
// Kids-and-Teens
filepath = "C:/Users/Kim/Desktop/2016여름/kt-content.rdf.u8";
buffered_input = new BufferedInputStream(new FileInputStream(
filepath));
saxParser.parse(buffered_input, default_handler);
System.out.println("Finished.");
} catch (SQLException sqex) {
System.out.println("SQLException: " + sqex.getMessage());
System.out.println("SQLState: " + sqex.getSQLState());
} catch (Exception e) {
e.printStackTrace();
}
}
}
這是我程序的全部代碼。
我昨天的原始代碼按以下方式嘗試了文件I / O(而不是使用“ BufferedInputStream”)
saxParser.parse("file:///C:/Users/Kim/Desktop/2016여름/content.rdf.u8/content.rdf.u8",
default_handler);
使用“ BufferedInputStream”后,我希望程序中的速度會有所提高,但是速度根本沒有提高。 我在找出導致速度問題的瓶頸時遇到了麻煩。 非常感謝你。
代碼中讀取的rdf文件的大小約為14 MB,我的計算機執行該代碼大約需要11分鍾。
使用SAX解析器,您應該能夠毫不費力地實現1Gb /分鍾的解析速度。 如果解析14Mb需要10分鍾,那么您要么做錯了什么,要么正在花費時間進行SAX解析之外的其他事情(例如數據庫更新)。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.