将功能应用于Spark RDD

Question

I'm trying to do some analysis on tweets. 我正在尝试对推文进行一些分析。 I want to apply .lower() to every text in tweets. 我想将.lower()应用于推text中的每个text 。 I used the following code 我用下面的代码

    actual_tweets = actual_tweets.map(lambda line: line["text"].lower() and line["quoted_status"]["text"].lower() if 'quoted_status' in line else line["text"].lower()).collect()

The problem is this since i'm using map , this line of code converts the text attribute to lowercase and returns me the only the text attribute ignoring all others which is not what i want. 问题是因为我正在使用map ，所以这行代码将text属性转换为小写，并向我返回忽略所有其他属性的唯一text属性，这不是我想要的。 I just wanted to know if any of spark transformations help me achieve what i want. 我只是想知道spark transformations帮助我实现我想要的目标。

Answer 1

You can for example return a tuple of (input, transformed_input): 例如，您可以返回一个元组（输入，transformed_input）：

def transform(line):
    if 'quoted_status' in line:
        return (
            # Is `and` what you really want here?
            line, line["text"].lower() and line["quoted_status"]["text"].lower() 
        )
    else:
        return line, line["text"].lower()

actual_tweets.map(transform)

将功能应用于Spark RDD

问题描述

1 个解决方案

解决方案1
2 已采纳

将功能应用于Spark RDD

问题描述

1 个解决方案

解决方案1 2 已采纳

解决方案1
2 已采纳