[英]Read data saved by spark redis using Java
我使用spark-redis将数据集保存到 Redis。然后我使用Spring 数据 redis读取此数据:
这个object我保存到redis:
@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
@Builder
@RedisHash("collaborative_filtering")
public class RatingResult implements Serializable {
private static final long serialVersionUID = 8755574422193819444L;
@Id
private String id;
@Indexed
private int user;
@Indexed
private String product;
private double productN;
private double rating;
private float prediction;
public static RatingResult convert(Row row) {
int user = row.getAs("user");
String product = row.getAs("product");
double productN = row.getAs("productN");
double rating = row.getAs("rating");
float prediction = row.getAs("prediction");
String id = user + product;
return RatingResult.builder().id(id).user(user).product(product).productN(productN).rating(rating)
.prediction(prediction).build();
}
}
使用 spark-redis 保存 object:
JavaRDD<RatingResult> result = ...
...
sparkSession.createDataFrame(result, RatingResult.class).write().format("org.apache.spark.sql.redis")
.option("table", "collaborative_filtering").mode(SaveMode.Overwrite).save();
存储库:
@Repository
public interface RatingResultRepository extends JpaRepository<RatingResult, String> {
}
我无法通过使用8818215151203188数据Redis保存在8831355696888中命令: redis-cli -p 6379 keys \*
和redis-cli hgetall $key
)
那么如何读取使用 Java 或 Java 中的任何库保存的数据?
以下对我有用。
从 spark-redis 写入数据。
我在这里使用 Scala,但它与您在 Java 中所做的基本相同。我唯一改变的是我添加了一个.option("key.column", "id")
来指定 hash id。
val ratingResult = new RatingResult("1", 1, "product1", 2.0, 3.0, 4)
val result: JavaRDD[RatingResult] = spark.sparkContext.parallelize(Seq(ratingResult)).toJavaRDD()
spark
.createDataFrame(result, classOf[RatingResult])
.write
.format("org.apache.spark.sql.redis")
.option("key.column", "id")
.option("table", "collaborative_filtering")
.mode(SaveMode.Overwrite)
.save()
在 spring-data-redis 中,我有以下内容:
@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
@Builder
@RedisHash("collaborative_filtering")
public class RatingResult implements Serializable {
private static final long serialVersionUID = 8755574422193819444L;
@Id
private String id;
@Indexed
private int user;
@Indexed
private String product;
private double productN;
private double rating;
private float prediction;
@Override
public String toString() {
return "RatingResult{" +
"id='" + id + '\'' +
", user=" + user +
", product='" + product + '\'' +
", productN=" + productN +
", rating=" + rating +
", prediction=" + prediction +
'}';
}
}
我使用 CrudRepository 而不是 JPA:
@Repository
public interface RatingResultRepository extends CrudRepository<RatingResult, String> {
}
查询:
RatingResult found = ratingResultRepository.findById("1").get();
System.out.println("found = " + found);
output:
found = RatingResult{id='null', user=1, product='product1', productN=2.0, rating=3.0, prediction=4.0}
您可能会注意到id
字段未填充,因为存储的 spark-redis 具有 hash id 而不是 hash 属性。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.