從 python 輸出中刪除字符

Question

我做了很多工作來從 spark python 輸出中刪除字符，比如uu' u" [()/'" ，這給我做進一步的工作帶來了問題。 所以請把重點放在相同的地方。

我有這樣的輸入，

(u"(u'[25145,   12345678'", 0.0)
(u"(u'[25146,   25487963'", 43.0) when i applied code to summing out the result. this gives me the output like
(u'(u"(u\'[54879,    5125478\'"', 0.0)
(u"(u'[25145,   25145879'", 11.0)
(u'(u"(u\'[56897,    22548793\'"', 0.0) so i want to remove all the character like (u'(u"(u\'["'')

我想要輸出像

54879,5125478,0.0

25145,25145879,11.0

我試過的代碼是

from pyspark import SparkContext
import os
import sys

sc = SparkContext("local", "aggregate")

file1 = sc.textFile("hdfs://localhost:9000/data/first/part-00000")
file2 = sc.textFile("hdfs://localhost:9000/data/second/part-00000")

file3 = file1.union(file2).coalesce(1).map(lambda line: line.split(','))

result = file3.map(lambda x: ((x[0]+', '+x[1],float(x[2][:-1])))).reduceByKey(lambda a,b:a+b).coalesce(1)

result.saveAsTextFile("hdfs://localhost:9000/Test1")

Answer 1

我認為您唯一的問題是您必須在將結果保存到文件之前重新格式化結果，例如：

result.map(lambda x:x[0]+','+str(x[1])).saveAsTextFile("hdfs://localhost:9000/Test1")

從 python 輸出中刪除字符

問題描述

1 個解決方案

解決方案1
1 已采納 2015-11-30 13:56:21

從 python 輸出中刪除字符

問題描述

1 個解決方案

解決方案1 1 已采納 2015-11-30 13:56:21

解決方案1
1 已采納 2015-11-30 13:56:21