繁体   English   中英

根据 pandas 列中的条件获取字符串切片

[英]Getting a slice of a string based on conditions in pandas column

我有一个 dataframe df像这样带有一个包含字符串列表的单列tag 如果文本包含“复制:包含:是:包含:”或“复制:”或“复制:包含:是:”并且在下一个“|”之前,我想获取最后一个“:”之后的每一个。 我怎么能在 pandas 中做到这一点?

tag
'Headline: Copy: Included: Yes: Include: Choose the right weekly meal plan for you'
'Subhead: Copy: Included: Yes: Include For Brands and Individuals Everywhere'
'Subhead: Copy: Free video meetings with built-in team messaging.'
'Headline: Copy: Farm-fresh delivered | Subhead: Focus: Focus on meals'
'Headline: Copy: Join the plant tribe | Subhead: Copy: There's always room at our table.'
'Price point: Level: 3 months free | Headline: Copy: Get 3 months of Premium for free | Button: Call to action: Get 3 months free'
tag
'Choose the right weekly meal plan for you'
'Include For Brands and Individuals Everywhere'
'Free video meetings with built-in team messaging.'
'Farm-fresh delivered'
'Join the plant tribe'
'Get 3 months of Premium for free'

我做了一些假设,这可能不是很好的假设,但如果它们是错误的,希望它仍然会有所帮助。 具体来说,我假设您想要提取的文本中永远不会有冒号。

def format(lines):
    for line in lines:
        line = line.split('|')[0]
        line = line.split(':')[-1]
        #print(line)

如果你将你的标签作为一个列表传递到这个列表中,它首先会获取第一个 | 之前的所有内容。 (第一次拆分),然后在您将其拆分时采用最后一个字符串:(第二次拆分)。

我做了这个程序:

accepted_keys = ['Copy: Included: Yes: Include:','Copy:','Copy: Included: Yes:']

pd.options.display.max_colwidth = 200 #to keep larger string

def function(row):
    row = row.to_string()
    for k in accepted_keys:
        if k in row:
            r1 = row.split('|')[0]
            r2 = r1.split(':')[-1]

            return r2


dfd.apply(lambda row: function(row), axis=1) #dfd is the name of Dataframe

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM