[英]Extract numbers from string column from Pandas DF
我有下一個帶有字符串列(“信息”)的 DataFrame:
df = pd.DataFrame( {'Date': ["2014/02/02", "2014/02/03"], 'Info': ["Out of 78 shares traded during the session today, there were 54 increases, 9 without change and 15 decreases.", "Out of 76 shares traded during the session today, there were 60 increases, 4 without change and 12 decreases."]})
我需要將“信息”中的數字提取到同一 df 中的新 4 列。
第一行的值為 [78, 54, 9, 15]
我嘗試過
df[["new1","new2","new3","new4"]]= df.Info.str.extract('(\d+(?:\.\d+)?)', expand=True).astype(int)
但我認為這更復雜。
問候,
Extractall
可能更適合這項任務
df[["new1","new2","new3","new4"]] = df['Info'].str.extractall(r'(\d+)')[0].unstack()
Date Info new1 new2 new3 new4
0 2014/02/02 Out of 78 shares traded during the session tod... 78 54 9 15
1 2014/02/03 Out of 76 shares traded during the session tod... 76 60 4 12
就我所知,您試圖避免捕獲數字的小數部分,對嗎? ( (?:\.\d+)?
部分。)
首先,如果您想要所有匹配項,則需要使用pd.Series.str.extractall
; extract
在第一個之后停止。
使用您的df
,嘗試以下代碼:
# Get a multiindexed dataframe using extractall
expanded = df.Info.str.extractall(r"(\d+(?:\.\d+)?)")
# Pivot the index labels
df_2 = expanded.unstack()
# Drop the multiindex
df_2.columns = df_2.columns.droplevel()
# Add the columns to the original dataframe (inplace or make a new df)
df_combined = pd.concat([df, df_2], axis=1)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.