I have a column structured like this: album: "The Dark Side Of The Moon" (1973)
there is the name of the album and at the end, between parenthesis, the year.
I just need to eliminate that last part from that column and create a new column called "year" with only the year.
I 'm not sure if use re.search()
, but I've tried this:
data['year'] = data['Album'].str.extract(r'\(\d*\)')
this pattern works if I test it with re.search() on a single string and it works in online tools for regular expressions.
so what can I do?
thanks!
You still can use your re.search
data['year']=data['Album'].map(lambda x : re.search(r'\(\d*\)',x).group(0))
You can try using split:
data["year"] = data['Album'].apply(lambda x: int(x.split("(")[1][0:-1]))
This should separate the years and the names of the albums and also strip them from the extra characters:
import pandas as pd
data = pd.DataFrame({"album": ['"The Dark Side Of The Moon" (1973)']})
names = []
years = []
for i in range(len(data['album'])):
year = data['album'][i].split()[-1]
years.append(int(year.strip("()")))
names.append(data['album'][i].replace(year,'').strip('" '))
data = pd.DataFrame(names,years)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.