I have a csv file containing a awards column with various different nominations and awards won. I want to extract data from the awards column in this dataset
and split it into several columns. The awards has details of wins, nominations in general and also wins and nominations in certain categories(eg Oscar, BAFTA etc.) A sample input of awards column is shown below.
And I want to split this data into several columns analyzing the data. Can we achieve this using python? I am using pandas for accessing dataframe
. A sample expected output is shown below.
It seems your data are not particularly well structured. If the format was guaranteed to be in the form:
x wins & y nominations.
Then the following code:
testStrings = ['1 win & 1 nomination.','2 wins.','5 nominations.', '3 wins & 8 nominations.', '2 wins.','9 wins.']
text = [i.split('&') for i in testStrings]
data=[]
for row in text:
for t in row:
winIndex = t.find('win')
nomIndex = t.find('nom')
if winIndex>0:
w=int(t[:winIndex-1] )
else:
w=0
if nomIndex>0:
n=int(t[:nomIndex-1] )
else:
n=0
data.append([w,n])
Will give you the list data where each element is [numWins, numNoms] for each row.
You can probably extend this to cope with different formats (eg "Won 1 Primetime Emmy"), by searching for those keywords (like the code looks for the substrings "won" and "nom"). Hope this provides some help.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.