[英]Parsing a text file (large dataset) in java
I have a text file, each line looks like this : (a movie reviews database) 我有一个文本文件,每一行看起来都像这样:(电影评论数据库)
product/productId: B00004CK40 review/userId: A39IIHQF18YGZA review/profileName: C. A. M. Salas review/helpfulness: 0/0 review/score: 4.0 review/time: 1175817600 review/summary: Reliable comedy review/text: Nice script, well acted comedy, and a young Nicolette Sheridan. Cusak is in top form.
I want to parse this file in order to retrieve : 我想解析此文件以便检索:
This information will be later encapsulated using MovieReview
& Movie
class. 稍后将使用
MovieReview
和Movie
类封装此信息。
public class MovieReview {
private Movie movie;
private String userId;
private String profileName;
private String helpfulness;
private Date timestamp;
private String summary;
private String review;
...
Can anyone offer a proper & efficient way to parse this file (large dataset) ? 谁能提供正确有效的方法来解析此文件(大型数据集)?
Thanks. 谢谢。
If it's a large dataset, you'll want to avoid loading the entire list into memory at once. 如果数据集很大,则要避免将整个列表立即加载到内存中。 I'd probably solve this with a handler for each row
我可能会为每行使用一个处理程序来解决这个问题
public interface MovieReviewHandler {
void handle(MovieReview revies);
}
Then you could parse as follows: 然后您可以解析如下:
public class MovieReviewParser {
public void parse(BufferedReader reader, MovieReviewHandler handler) {
Pattern regex = Pattern.compile("product/productId:(.*)review/userId:(.*)review/profileName:(.*)"); // add other fields
String line;
while ((line = reader.readLine()) != null) {
Matcher matcher = regex.matcher(line);
if (!matcher.matches()) throw new RuntimeException();
MovieReview review = new MovieReview();
review.productId = matcher.group(1);
review.userId = matcher.group(2);
review.profileName = matcher.group(3);
// etc
handler.handle(review);
}
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.