根据另一列中是否存在值，将字符串有条件地追加到Pandas数据框中的行

Question

我正在使用Pandas数据框（Python）中的推文。 我试图通过以下方式指出特定的tweet是“引用的tweet”：

1）查看“ quoted_author”字段是否为空白

2）如果该字段不是空白，请在包含引用的作者用户名的tweet文本之前添加以下前缀：

“ QT @ [quoted_author]：[tweet文字]”

这是对我不起作用的代码。 我究竟做错了什么？ 谢谢！

for row in df['quoted_author']:
        if row == "":
            pass
        else:
            df['Text'].append('QT ' + df['quoted_author'].astype(str) + ': ' + df['Text'].astype(str))

Answer 1

而不是遍历每一行并查找是否等于null，请尝试获取所有非null的行。

df_author = df[df['quoated author'] != ""]

然后，使用apply函数将df_author的所有行附加相应的作者名称。

Answer 2

我经历并评估了实现此目的的两种不同方法。 第一个涉及使用Apply和单独的功能。 见下文：

df
      quoted_author        tweet_text
    0       person1         tweettext
    1       person2  somethingtweeted
    2           NaN         fooootext
    3           NaN        sometweets
    4        author            atweet
    5   some_author    someothertweet

方法1-运行并应用：

def nullCheck(author, tweet):
    if not pd.isnull(author):
        return 'QT ' + str(author) + ': ' + str(tweet)
    else:
        return np.nan


df['output'] = df[['quoted_author', 'tweet_text']].apply(lambda x: nullCheck(*x), axis=1)

%timeit df['output'] = df[['quoted_author', 'tweet_text']].apply(lambda x: nullCheck(*x), axis=1)
1000 loops, best of 3: 1.01 ms per loop

方法2-将数据框切片为仅查看非null作者，然后在单独的列中产生输出：

df.loc[~pd.isnull(df['quoted_author']),'output'] = 'QT ' + df['quoted_author'] + ': ' + df['tweet_text']

%timeit df.loc[~pd.isnull(df['quoted_author']),'output'] = 'QT ' + df['quoted_author'] + ': ' + df['tweet_text']
    1000 loops, best of 3: 1.68 ms per loop

有趣的是，第一种方法虽然我不确定为什么会更快。 其他人可以分享一些见解吗？ 无论哪种方式，这都会为您提供所需的东西。

Answer 3

另一种单线解决方案

设置（使用Andrew L的示例）

df = pd.DataFrame({'quoted_author': {0: 'person1',
  1: 'person2',  2: '',  3: '',  4: 'author',  5: 'some_author'}, 'text': {0: 'tweettext',
  1: 'somethingtweeted',  2: 'fooootext',  3: 'sometweets',  4: 'atweet',  5: 'someothertweet'}})

解

#use apply to reset test column based on the value of quoted_author. 
df.text = df.apply(lambda x: 'QT {}: {}'.format(x.quoted_author, x.text) if x.quoted_author else x.text, axis=1)

  quoted_author                            text
0       person1           QT person1: tweettext
1       person2    QT person2: somethingtweeted
2                                     fooootext
3                                    sometweets
4        author               QT author: atweet
5   some_author  QT some_author: someothertweet

根据另一列中是否存在值，将字符串有条件地追加到Pandas数据框中的行

问题描述

3 个解决方案

解决方案1
0 2017-05-07 10:02:44

解决方案2
0 2017-05-07 10:54:01

解决方案3
0 2017-05-07 13:05:53

根据另一列中是否存在值，将字符串有条件地追加到Pandas数据框中的行

问题描述

3 个解决方案

解决方案1 0 2017-05-07 10:02:44

解决方案2 0 2017-05-07 10:54:01

解决方案3 0 2017-05-07 13:05:53

解决方案1
0 2017-05-07 10:02:44

解决方案2
0 2017-05-07 10:54:01

解决方案3
0 2017-05-07 13:05:53