從 Pandas DF 的字符串列中提取數字

Question

我有下一個帶有字符串列（“信息”）的 DataFrame：

df = pd.DataFrame( {'Date': ["2014/02/02", "2014/02/03"], 'Info': ["Out of 78 shares traded during the session today, there were 54 increases, 9 without change and 15 decreases.", "Out of 76 shares traded during the session today, there were 60 increases, 4 without change and 12 decreases."]})

我需要將“信息”中的數字提取到同一 df 中的新 4 列。

第一行的值為 [78, 54, 9, 15]

我嘗試過

df[["new1","new2","new3","new4"]]= df.Info.str.extract('(\d+(?:\.\d+)?)', expand=True).astype(int)

但我認為這更復雜。

問候，

Answer 1

Extractall可能更適合這項任務

df[["new1","new2","new3","new4"]] = df['Info'].str.extractall(r'(\d+)')[0].unstack()

         Date                                               Info new1 new2 new3 new4
0  2014/02/02  Out of 78 shares traded during the session tod...   78   54    9   15
1  2014/02/03  Out of 76 shares traded during the session tod...   76   60    4   12

Answer 2

就我所知，您試圖避免捕獲數字的小數部分，對嗎？ （ (?:\.\d+)?部分。）

首先，如果您想要所有匹配項，則需要使用pd.Series.str.extractall ； extract在第一個之后停止。

使用您的df ，嘗試以下代碼：

# Get a multiindexed dataframe using extractall
expanded = df.Info.str.extractall(r"(\d+(?:\.\d+)?)")

# Pivot the index labels
df_2 = expanded.unstack()

# Drop the multiindex
df_2.columns = df_2.columns.droplevel()


# Add the columns to the original dataframe (inplace or make a new df)
df_combined = pd.concat([df, df_2], axis=1)

從 Pandas DF 的字符串列中提取數字

問題描述

2 個解決方案

解決方案1
0 2021-02-24 01:16:20

解決方案2
0 已采納 2021-02-24 01:32:07

從 Pandas DF 的字符串列中提取數字

問題描述

2 個解決方案

解決方案1 0 2021-02-24 01:16:20

解決方案2 0 已采納 2021-02-24 01:32:07

解決方案1
0 2021-02-24 01:16:20

解決方案2
0 已采納 2021-02-24 01:32:07