简体   繁体   中英

How to extract information from a string in Python?

I have a csv file containing a awards column with various different nominations and awards won. I want to extract data from the awards column in this dataset and split it into several columns. The awards has details of wins, nominations in general and also wins and nominations in certain categories(eg Oscar, BAFTA etc.) A sample input of awards column is shown below.

点击这里输入图像

And I want to split this data into several columns analyzing the data. Can we achieve this using python? I am using pandas for accessing dataframe . A sample expected output is shown below.

单击此处获取输出图像

It seems your data are not particularly well structured. If the format was guaranteed to be in the form:

x wins & y nominations.

Then the following code:

testStrings = ['1 win & 1 nomination.','2 wins.','5 nominations.', '3 wins & 8 nominations.', '2 wins.','9 wins.']

text = [i.split('&') for i in testStrings]

data=[]
for row in text:
    for t in row:
        winIndex = t.find('win')
        nomIndex = t.find('nom')
        if winIndex>0:
            w=int(t[:winIndex-1] )
        else:
            w=0
        if nomIndex>0:
            n=int(t[:nomIndex-1] )
        else:
            n=0
    data.append([w,n])

Will give you the list data where each element is [numWins, numNoms] for each row.

You can probably extend this to cope with different formats (eg "Won 1 Primetime Emmy"), by searching for those keywords (like the code looks for the substrings "won" and "nom"). Hope this provides some help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM