I want to pars GOv2 collection format and I want to use TrecGov2Parser . I find its code in this page . The input file is test file and it contains just one document of GOV2 collection. This is my code:
public static void writeHTMLText()
{
try
{
FileWriter fw1= new FileWriter(new File("/home/fl/Desktop/GOV_Text/GOV/00.txt"));
BufferedWriter bw1 = new BufferedWriter(fw1);
FileReader fileReader = new FileReader(new File("/home/fl/Desktop/GOV/00"));
BufferedReader br = new BufferedReader(fileReader);
String docs="";
String line;
while((line=br.readLine())!= null )
docs= docs+line+"\n";
DocData docData = new DocData();
DocData result = new TrecGov2Parser().parse(docData,"result00",new TrecContentSource(),new StringBuilder(docs),TrecDocParser.ParsePathType.GOV2);
bw1.write(result.getBody());
br.close();
bw1.close();
}
catch(Exception e)
{
e.printStackTrace();
}
}
I got this error.
java.lang.NullPointerException
at org.apache.lucene.benchmark.byTask.feeds.TrecGov2Parser.parse(TrecGov2Parser.java:56)
at LuceneParser.parserInput.writeHTMLText(parserInput.java:63)
I add *lucene-core-3.4.0.jar* and *lucene-benchmark-3.4.0.jar* to my project buildpath.
What do I need to do?
There is no private
, protected
or public
keyword at getHtmlParser()
. This means you can call this method only from inside the same package ( default/package visibility ). The method is intended not to be used by others .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.