Apply function to spark RDD

Question

I'm trying to do some analysis on tweets. I want to apply .lower() to every text in tweets. I used the following code

    actual_tweets = actual_tweets.map(lambda line: line["text"].lower() and line["quoted_status"]["text"].lower() if 'quoted_status' in line else line["text"].lower()).collect()

The problem is this since i'm using map , this line of code converts the text attribute to lowercase and returns me the only the text attribute ignoring all others which is not what i want. I just wanted to know if any of spark transformations help me achieve what i want.

Answer 1

You can for example return a tuple of (input, transformed_input):

def transform(line):
    if 'quoted_status' in line:
        return (
            # Is `and` what you really want here?
            line, line["text"].lower() and line["quoted_status"]["text"].lower() 
        )
    else:
        return line, line["text"].lower()

actual_tweets.map(transform)

Apply function to spark RDD

Question

1 answers

solution1
2 ACCPTED

Apply function to spark RDD

Question

1 answers

solution1 2 ACCPTED

solution1
2 ACCPTED