简体   繁体   中英

Getting a slice of a string based on conditions in pandas column

i have a dataframe df like so with a single column tag which has a list of strings. I want to get every after the last ':' if the text contains 'Copy: Included: Yes: Include:' or 'Copy:' or 'Copy: Included: Yes:' and before the next '|'. how can i do this in pandas?

tag
'Headline: Copy: Included: Yes: Include: Choose the right weekly meal plan for you'
'Subhead: Copy: Included: Yes: Include For Brands and Individuals Everywhere'
'Subhead: Copy: Free video meetings with built-in team messaging.'
'Headline: Copy: Farm-fresh delivered | Subhead: Focus: Focus on meals'
'Headline: Copy: Join the plant tribe | Subhead: Copy: There's always room at our table.'
'Price point: Level: 3 months free | Headline: Copy: Get 3 months of Premium for free | Button: Call to action: Get 3 months free'
tag
'Choose the right weekly meal plan for you'
'Include For Brands and Individuals Everywhere'
'Free video meetings with built-in team messaging.'
'Farm-fresh delivered'
'Join the plant tribe'
'Get 3 months of Premium for free'

I made some assumptions, that might not be good ones, but if they are wrong it will hopefully still be helpful. Specifically, I assumed that you'll never have a colon in the text you want to pull out.

def format(lines):
    for line in lines:
        line = line.split('|')[0]
        line = line.split(':')[-1]
        #print(line)

If you pass your tag into this, as a list, it first takes everything before the first | (first split), and then it takes the last string when you split it between: (the second split).

I made this program:

accepted_keys = ['Copy: Included: Yes: Include:','Copy:','Copy: Included: Yes:']

pd.options.display.max_colwidth = 200 #to keep larger string

def function(row):
    row = row.to_string()
    for k in accepted_keys:
        if k in row:
            r1 = row.split('|')[0]
            r2 = r1.split(':')[-1]

            return r2


dfd.apply(lambda row: function(row), axis=1) #dfd is the name of Dataframe

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM