I am trying to write 2 functions to convert string data in RDD to float format and then finding the average sepal length for iris dataset. Out of the 2 functions one is working fine but 2nd one is giving error. Can someone help me understand what mistake am i making here
is_float = lambda x: x.replace('.','',1).isdigit() and "." in x
def getSapellen(str2):
if isinstance(str2, float):
return str2
attlist=str2.split(",")
if is_float(attlist[0]):
return float(attlist[0])
else:
return 0.0
SepalLenAvg=irisRDD.reduce(lambda x,y: getSapellen(x) + getSapellen(y)) \
/(irisRDD.count()-1)
print(SepalLenAvg)
The above chunk of code is working. I am not able to figure out the mistake in below part
def getSapellen2(str2):
if ( str2.find("Sepal") != -1):
return str2
attlist=str2.split(",")
if isinstance(attlist[0],str):
return float(attlist[0])
else:
return 0.0
SepalLenAvg=irisRDD.reduce(lambda x,y: getSapellen2(x)+ getSapellen2(y)) \
/(irisRDD.count()-1)
print(SepalLenAvg)
On running the second method I am getting following error
TypeError: can only concatenate str (not "float") to str
This error means you are trying to add together string and float - the only place where you are adding things in the code is the lambda applied to whole irisRdd.
That means in at least one instance, calling getSapellen2(x)+ getSapellen2(y)
causes str to be returned by one call and float by other.
if you look at first if statement, there is return str2
- which is returning string, while all other conditions return numbers
That's mean this condition isinstance(str2, float)
of getSapellen
never true, while this condition str2.find("Sepal") != -1
from getSapellen2
is true at least once. Therefore, type of str2
is definitely not float
, it's string, you might want to cast it to float or doing something else and returns float value instead.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.