如何將具有相似字符串值的行聚合為 Pandas DataFrame 中的新行？

Question

下面是我使用 Pandas 創建的 DataFrame...

╔════════════════════════╦══════════╗
║        Column A        ║ Column B ║
╠════════════════════════╬══════════╣
║ /                      ║ 5.34     ║
║ new-shirts             ║ 6.78     ║
║ new-pants              ║ 10.11    ║
║ used-hats              ║ 1.56     ║
║ used-shirts            ║ 3.78     ║
║ brand-new-watches/gold ║ 4.21     ║
║ customer-service       ║ 0.29     ║
║ holiday-blowout-sale   ║ 12.45    ║
║ used-pants/corduroy    ║ 2.98     ║
║ special-discounts      ║ 6.99     ║
║ contact-us             ║ 1.67     ║
╚════════════════════════╩══════════╝

我想將具有相似字符串的行聚合為“A 列”中的新行（“主頁”將是“/”，任何帶有“新”的內容都將變為“新”，任何帶有“已使用”的內容都會變為“已使用” ”，其中包含“服務”的任何內容都將變為“服務”，而其他所有內容都將被推送到“其他”）並在“B 列”中找到這些值的平均值，如下所示。 我將如何使用 Python 和 Pandas 做到這一點？

╔══════════╦══════════╗
║ Column A ║ Column B ║
╠══════════╬══════════╣
║ Home     ║ 5.34     ║
║ New      ║ 7.03     ║
║ Used     ║ 2.77     ║
║ Service  ║ 0.29     ║
║ Other    ║ 7.04     ║
╚══════════╩══════════╝

此外，是否也可以將“客戶服務”和“聯系我們”等某些頁面合並到新的“服務”行中，而不將“聯系我們”也計入“其他”？

謝謝！

編輯：

@Erfan - 您的解決方案對於初始 DataFrame 的呈現方式非常有效，但我意識到它缺少相關數據。 如果您的解決方案看起來像這樣，而我試圖達到相同的結果，您的解決方案將如何改變？

╔═════════════════════════════════╦══════════╗
║            Column A             ║ Column B ║
╠═════════════════════════════════╬══════════╣
║ /                               ║ 5.34     ║
║ /new-shirts/                    ║ 6.78     ║
║ /new-pants/                     ║ 10.11    ║
║ /used-hats/                     ║ 1.56     ║
║ /used-shirts/                   ║ 3.78     ║
║ /brand-new-watches/gold/        ║ 4.21     ║
║ /customer-service/              ║ 0.29     ║
║ /holiday-blowout-sale/december/ ║ 12.45    ║
║ /used-pants/corduroy/           ║ 2.98     ║
║ /special-discounts/             ║ 6.99     ║
║ /contact-us/                    ║ 1.67     ║
╚═════════════════════════════════╩══════════╝

Answer 1

我們可以定義您想要分類的單詞，然后使用Series.str.extract從您的字符串中提取這些類別。

然后我們使用GroupBy.sum來獲取每個類別的總和：

words = ['/', 'New', 'Used', 'Service']

cats = (
    df['Column A'].str.extract('((?i)'+'|'.join(words)+')')
                  .fillna('other')[0]
                  .str.capitalize()
                  .str.replace('/', 'Home')
)

df = df.groupby(cats, sort=False)['Column B'].mean().rename_axis('Column A', axis=0).reset_index()

  Column A  Column B
0     Home  5.340000
1      New  7.033333
2     Used  2.773333
3  Service  0.290000
4    Other  7.036667

如何將具有相似字符串值的行聚合為 Pandas DataFrame 中的新行？

問題描述

1 個解決方案

解決方案1
-1 2020-01-21 21:29:37

如何將具有相似字符串值的行聚合為 Pandas DataFrame 中的新行？

問題描述

1 個解決方案

解決方案1 -1 2020-01-21 21:29:37

解決方案1
-1 2020-01-21 21:29:37