給定來自另一列的條件，如何遍歷特定 Pandas DataFrame 列的行？

Question

所以我基本上想要做的是以下內容，基於包含“日期”和“極性”列的數據框，“日期”（天）中有七個不同的值，“極性”中的值介於 -1 和 1 之間：

For each of the seven days:
i) count all values in the 'polarity' column that are positive
ii) count all values in the 'polarity' column that are negative
iii) count all values in the 'polarity' column for a given day (neg, neutral, pos)

編輯：對於 i)-iii) 中的每一天，輸出應該是一個整數，存儲在列表中

Edit2：我嘗試使用以下代碼實現它（僅適用於值 > 0）：

pos_tweets = df_tweets.apply(lambda x: True if x['polarity'] > 0 and x['date'] == '2020-02-07' else False, axis=1)
num_Pos = len(pos_tweets[pos_tweets == True].index)

但是，這返回了 0，即在 Excel 中簽入時是錯誤的。

請感謝您的幫助！

干杯，IG

Answer 1

考慮一個帶有邊距的pivot_table 。 下面用隨機種子數據演示：

數據

import numpy as np
import pandas as pd

np.random.seed(2112020)
random_df = pd.DataFrame({'date': np.random.choice(pd.date_range('2020-02-01', '2020-02-11'), 500),
                          'polarity': np.random.randint(-1, 2, 500)})

print(random_df.head(10))
#         date  polarity
# 0 2020-02-08        -1
# 1 2020-02-08         1
# 2 2020-02-06         0
# 3 2020-02-10        -1
# 4 2020-02-04        -1
# 5 2020-02-02         1
# 6 2020-02-05        -1
# 7 2020-02-04         0
# 8 2020-02-10         1
# 9 2020-02-09         0

聚合

pvt_df = (random_df.assign(day_date = lambda x: x['date'].dt.normalize(),
                           polarity_indicator = lambda x: np.select([x['polarity'] > 0, x['polarity'] < 0, x['polarity'] == 0],
                                                                    ['positive', 'negative', 'neutral']))
                   .pivot_table(index = 'day_date',
                                columns = 'polarity_indicator',
                                values = 'polarity',
                                aggfunc = 'count',
                                margins = True)
         )

print(pvt_df)

#  polarity_indicator   negative  neutral  positive  All
#  day_date
#  2020-02-01 00:00:00        17       14        16   47
#  2020-02-02 00:00:00        19       14        12   45
#  2020-02-03 00:00:00        11       16        12   39
#  2020-02-04 00:00:00        17       18        13   48
#  2020-02-05 00:00:00        11       15        22   48
#  2020-02-06 00:00:00        12       12        16   40
#  2020-02-07 00:00:00        16       15        21   52
#  2020-02-08 00:00:00        15       10        13   38
#  2020-02-09 00:00:00        17       15        19   51
#  2020-02-10 00:00:00        13       16        19   48
#  2020-02-11 00:00:00        13       12        19   44
#  All                       161      157       182  500

Answer 2

如果我理解正確，您需要對每一天的極性值進行計數。 可以是這樣的：

positive = df_tweets[df_tweets['polarity'] > 0].groupby('date').count().reset_index()
negative = df_tweets[df_tweets['polarity'] < 0].groupby('date').count().reset_index()
neutral = df_tweets[df_tweets['polarity'] == 0].groupby('date').count().reset_index()

此代碼的輸出是具有兩列的三個數據幀：一個具有唯一的日期值，另一個具有高於、低於或等於 0 的極性計數。

給定來自另一列的條件，如何遍歷特定 Pandas DataFrame 列的行？

問題描述

2 個解決方案

解決方案1
1 2020-02-11 18:43:21

解決方案2
0 2020-02-11 18:22:24

給定來自另一列的條件，如何遍歷特定 Pandas DataFrame 列的行？

問題描述

2 個解決方案

解決方案1 1 2020-02-11 18:43:21

解決方案2 0 2020-02-11 18:22:24

解決方案1
1 2020-02-11 18:43:21

解決方案2
0 2020-02-11 18:22:24