根据 pandas 列中的条件获取字符串切片

Question

我有一个 dataframe df像这样带有一个包含字符串列表的单列tag 。 如果文本包含“复制：包含：是：包含：”或“复制：”或“复制：包含：是：”并且在下一个“|”之前，我想获取最后一个“：”之后的每一个。 我怎么能在 pandas 中做到这一点？

tag
'Headline: Copy: Included: Yes: Include: Choose the right weekly meal plan for you'
'Subhead: Copy: Included: Yes: Include For Brands and Individuals Everywhere'
'Subhead: Copy: Free video meetings with built-in team messaging.'
'Headline: Copy: Farm-fresh delivered | Subhead: Focus: Focus on meals'
'Headline: Copy: Join the plant tribe | Subhead: Copy: There's always room at our table.'
'Price point: Level: 3 months free | Headline: Copy: Get 3 months of Premium for free | Button: Call to action: Get 3 months free'

tag
'Choose the right weekly meal plan for you'
'Include For Brands and Individuals Everywhere'
'Free video meetings with built-in team messaging.'
'Farm-fresh delivered'
'Join the plant tribe'
'Get 3 months of Premium for free'

Answer 1

我做了一些假设，这可能不是很好的假设，但如果它们是错误的，希望它仍然会有所帮助。 具体来说，我假设您想要提取的文本中永远不会有冒号。

def format(lines):
    for line in lines:
        line = line.split('|')[0]
        line = line.split(':')[-1]
        #print(line)

如果你将你的标签作为一个列表传递到这个列表中，它首先会获取第一个 | 之前的所有内容。 （第一次拆分），然后在您将其拆分时采用最后一个字符串：（第二次拆分）。

Answer 2

我做了这个程序：

accepted_keys = ['Copy: Included: Yes: Include:','Copy:','Copy: Included: Yes:']

pd.options.display.max_colwidth = 200 #to keep larger string

def function(row):
    row = row.to_string()
    for k in accepted_keys:
        if k in row:
            r1 = row.split('|')[0]
            r2 = r1.split(':')[-1]

            return r2


dfd.apply(lambda row: function(row), axis=1) #dfd is the name of Dataframe

根据 pandas 列中的条件获取字符串切片

问题描述

2 个解决方案

解决方案1
0 2021-05-04 13:30:32

解决方案2
0 2021-05-04 14:03:49

根据 pandas 列中的条件获取字符串切片

问题描述

2 个解决方案

解决方案1 0 2021-05-04 13:30:32

解决方案2 0 2021-05-04 14:03:49

解决方案1
0 2021-05-04 13:30:32

解决方案2
0 2021-05-04 14:03:49