简体   繁体   中英

Extract a numerical information from a Pandas column and insert it in a new column

I have a column structured like this: album: "The Dark Side Of The Moon" (1973) there is the name of the album and at the end, between parenthesis, the year.

I just need to eliminate that last part from that column and create a new column called "year" with only the year.

I 'm not sure if use re.search() , but I've tried this:

data['year'] = data['Album'].str.extract(r'\(\d*\)')

this pattern works if I test it with re.search() on a single string and it works in online tools for regular expressions.

so what can I do?

thanks!

You still can use your re.search

data['year']=data['Album'].map(lambda x : re.search(r'\(\d*\)',x).group(0))

You can try using split:

data["year"] = data['Album'].apply(lambda x: int(x.split("(")[1][0:-1]))

This should separate the years and the names of the albums and also strip them from the extra characters:

import pandas as pd

data = pd.DataFrame({"album": ['"The Dark Side Of The Moon" (1973)']})

names = []
years = []
for i in range(len(data['album'])):
  year = data['album'][i].split()[-1]
  years.append(int(year.strip("()")))
  names.append(data['album'][i].replace(year,'').strip('" '))

data = pd.DataFrame(names,years)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM