简体   繁体   中英

Why is my PySpark function2 giving error and function 1 working fine, logically they both are doing the same thing ? can someone help me understand?

I am trying to write 2 functions to convert string data in RDD to float format and then finding the average sepal length for iris dataset. Out of the 2 functions one is working fine but 2nd one is giving error. Can someone help me understand what mistake am i making here

        is_float = lambda x: x.replace('.','',1).isdigit() and "." in x
        def getSapellen(str2):
            if isinstance(str2, float):
               return str2 
            attlist=str2.split(",")
            if is_float(attlist[0]):
               return float(attlist[0])
            else:
               return 0.0
        SepalLenAvg=irisRDD.reduce(lambda x,y: getSapellen(x) + getSapellen(y)) \
        /(irisRDD.count()-1)
        print(SepalLenAvg)

The above chunk of code is working. I am not able to figure out the mistake in below part

        def getSapellen2(str2):
            if ( str2.find("Sepal") != -1):
                return str2
            attlist=str2.split(",")
            if isinstance(attlist[0],str):
                return float(attlist[0])
            else:
                return 0.0
        SepalLenAvg=irisRDD.reduce(lambda x,y: getSapellen2(x)+ getSapellen2(y)) \
        /(irisRDD.count()-1)
        print(SepalLenAvg)

On running the second method I am getting following error

TypeError: can only concatenate str (not "float") to str

This error means you are trying to add together string and float - the only place where you are adding things in the code is the lambda applied to whole irisRdd.

That means in at least one instance, calling getSapellen2(x)+ getSapellen2(y) causes str to be returned by one call and float by other.

if you look at first if statement, there is return str2 - which is returning string, while all other conditions return numbers

That's mean this condition isinstance(str2, float) of getSapellen never true, while this condition str2.find("Sepal") != -1 from getSapellen2 is true at least once. Therefore, type of str2 is definitely not float , it's string, you might want to cast it to float or doing something else and returns float value instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM