简体   繁体   English

使用pyspark SparseVectors解析所有零稀疏向量

[英]Parsing all zero sparse vectors with pyspark SparseVectors

In pyspark, if I generate a sparse vector that represents an all zero vector and then stringify it it works as expected: 在pyspark中,如果我生成一个表示全零向量的稀疏向量,然后对其进行字符串化,则它将按预期工作:

>>> res = Vectors.stringify(SparseVector(4, [], []))
'(4,[],[])'

But then the parse method fails to load this back: 但随后parse方法无法将其加载回:

>>> SparseVector.parse(res)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../spark-1.5.2-bin-hadoop2.4/python/pyspark/mllib/linalg/__init__.py", line 545, in parse
    raise ValueError("Unable to parse indices from %s." % new_s)
ValueError: Unable to parse indices from .

Anyone knows of the way to solve this? 有人知道解决这个问题的方法吗?

This is a bug described by SPARK-14739 . 这是SPARK-14739描述的错误。 The simplest workaround for now is to use ast module instead: 目前最简单的解决方法是改用ast模块:

import ast
from pyspark.mllib.linalg import SparseVector

def parse_sparse(s):
    return SparseVector(*ast.literal_eval(s.strip()))

parse_sparse("(1, [], [])")
## SparseVector(1, {})

parse_sparse("(5, [1, 3], [0.4, -0.1])")
## SparseVector(5, {1: 0.4, 3: -0.1})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM