[英]Parsing a text file (large dataset) in java
我有一个文本文件,每一行看起来都像这样:(电影评论数据库)
product/productId: B00004CK40 review/userId: A39IIHQF18YGZA review/profileName: C. A. M. Salas review/helpfulness: 0/0 review/score: 4.0 review/time: 1175817600 review/summary: Reliable comedy review/text: Nice script, well acted comedy, and a young Nicolette Sheridan. Cusak is in top form.
我想解析此文件以便检索:
稍后将使用MovieReview
和Movie
类封装此信息。
public class MovieReview {
private Movie movie;
private String userId;
private String profileName;
private String helpfulness;
private Date timestamp;
private String summary;
private String review;
...
谁能提供正确有效的方法来解析此文件(大型数据集)?
谢谢。
如果数据集很大,则要避免将整个列表立即加载到内存中。 我可能会为每行使用一个处理程序来解决这个问题
public interface MovieReviewHandler {
void handle(MovieReview revies);
}
然后您可以解析如下:
public class MovieReviewParser {
public void parse(BufferedReader reader, MovieReviewHandler handler) {
Pattern regex = Pattern.compile("product/productId:(.*)review/userId:(.*)review/profileName:(.*)"); // add other fields
String line;
while ((line = reader.readLine()) != null) {
Matcher matcher = regex.matcher(line);
if (!matcher.matches()) throw new RuntimeException();
MovieReview review = new MovieReview();
review.productId = matcher.group(1);
review.userId = matcher.group(2);
review.profileName = matcher.group(3);
// etc
handler.handle(review);
}
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.