如何將數據框中的所有值拆分並保留在新列中？

Question

我需要拆分Product and Quantity列。 新列名稱為Quantity 。

如果您看到下面的示例，一些行將以 [2] 中的數量信息和其他 [1] 中的信息開頭。 另外，我不能使用 [-] 因為在下面的示例中，'-' 上的第二個拆分將起作用，但第 3 行和第 4 行將不正確

Product and Quantity
ABC-BBC-Bottle- 1 - 30 mg
BBC-44-Capsule- 10 - 500mg
KKP-Bottle- 5 - 30 mg
R2B-Powder-500mg

我需要以下幫助：

當“-”的位置在所有行中並不總是相同時，如何拆分？
如何在 [-] 之前或之后存儲所有值。 我知道我可以在之后使用 [-2]，在之前使用 [2]。 但是在我使用 [2] 拆分之后或在我使用 [-2] 之前它沒有存儲所有值？

目前，它如下所示。

df = source_df[['Product and Quantity']]
df['Quantity'] = df['Product and Quantity'].str.split('-').str[2]

輸出如下所示。

Quantity

Bottle
Capsule
5
500mg

我希望它看起來像下面這樣。

Quantity

Bottle - 1 - 30 mg
Capsule - 10 - 500mg
Bottle - 5 - 30 mg
Powder - 500mg

Answer 1

可靠的方法：使用正則表達式！

regex = r'[^-]+-((?:[^-]+-){,2}[^-]+)$'
df['Quantity'] = df['Product and Quantity'].str.extract(regex)

輸出：

         Product and Quantity             Quantity
0   ABC-BBC-Bottle- 1 - 30 mg    Bottle- 1 - 30 mg
1  BBC-44-Capsule- 10 - 500mg  Capsule- 10 - 500mg
2       KKP-Bottle- 5 - 30 mg    Bottle- 5 - 30 mg
3            R2B-Powder-500mg         Powder-500mg

正則表達式演示

Answer 2

df['Quantity'] = df[0].str.replace(' ', '').str.findall('\w+-\d*-*\d*mg').str[0].str.replace('-', ' - ')

輸出：

         Product and Quantity              Quantity
0   ABC-BBC-Bottle- 1 - 30 mg     Bottle - 1 - 30mg
1  BBC-44-Capsule- 10 - 500mg  Capsule - 10 - 500mg
2       KKP-Bottle- 5 - 30 mg     Bottle - 5 - 30mg
3            R2B-Powder-500mg        Powder - 500mg

如何將數據框中的所有值拆分並保留在新列中？

問題描述

2 個解決方案

解決方案1
1 已采納 2022-05-12 20:30:01

解決方案2
0 2022-05-12 20:28:35

如何將數據框中的所有值拆分並保留在新列中？

問題描述

2 個解決方案

解決方案1 1 已采納 2022-05-12 20:30:01

解決方案2 0 2022-05-12 20:28:35

解決方案1
1 已采納 2022-05-12 20:30:01

解決方案2
0 2022-05-12 20:28:35