简体   繁体   中英

Extract numbers from string column from Pandas DF

I have the next DataFrame with string column ("Info"):

df = pd.DataFrame( {'Date': ["2014/02/02", "2014/02/03"], 'Info': ["Out of 78 shares traded during the session today, there were 54 increases, 9 without change and 15 decreases.", "Out of 76 shares traded during the session today, there were 60 increases, 4 without change and 12 decreases."]})

I need to extract the numbers from "Info" to new 4 columns in the same df.

The first row will have the values [78, 54, 9, 15]

I have trying with

df[["new1","new2","new3","new4"]]= df.Info.str.extract('(\d+(?:\.\d+)?)', expand=True).astype(int)

but I think that is more complicated.

regards,

Extractall might be better for this task

df[["new1","new2","new3","new4"]] = df['Info'].str.extractall(r'(\d+)')[0].unstack()
         Date                                               Info new1 new2 new3 new4
0  2014/02/02  Out of 78 shares traded during the session tod...   78   54    9   15
1  2014/02/03  Out of 76 shares traded during the session tod...   76   60    4   12

Just so I understand, you're trying to avoid capturing decimal parts of numbers, right? (The (?:\.\d+)? part.)

First off, you need to use pd.Series.str.extractall if you want all the matches; extract stops after the first.

Using your df , try this code:

# Get a multiindexed dataframe using extractall
expanded = df.Info.str.extractall(r"(\d+(?:\.\d+)?)")

# Pivot the index labels
df_2 = expanded.unstack()

# Drop the multiindex
df_2.columns = df_2.columns.droplevel()


# Add the columns to the original dataframe (inplace or make a new df)
df_combined = pd.concat([df, df_2], axis=1)

输出df

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM