[英]Getting a slice of a string based on conditions in pandas column
我有一个 dataframe df
像这样带有一个包含字符串列表的单列tag
。 如果文本包含“复制:包含:是:包含:”或“复制:”或“复制:包含:是:”并且在下一个“|”之前,我想获取最后一个“:”之后的每一个。 我怎么能在 pandas 中做到这一点?
tag
'Headline: Copy: Included: Yes: Include: Choose the right weekly meal plan for you'
'Subhead: Copy: Included: Yes: Include For Brands and Individuals Everywhere'
'Subhead: Copy: Free video meetings with built-in team messaging.'
'Headline: Copy: Farm-fresh delivered | Subhead: Focus: Focus on meals'
'Headline: Copy: Join the plant tribe | Subhead: Copy: There's always room at our table.'
'Price point: Level: 3 months free | Headline: Copy: Get 3 months of Premium for free | Button: Call to action: Get 3 months free'
tag
'Choose the right weekly meal plan for you'
'Include For Brands and Individuals Everywhere'
'Free video meetings with built-in team messaging.'
'Farm-fresh delivered'
'Join the plant tribe'
'Get 3 months of Premium for free'
我做了一些假设,这可能不是很好的假设,但如果它们是错误的,希望它仍然会有所帮助。 具体来说,我假设您想要提取的文本中永远不会有冒号。
def format(lines):
for line in lines:
line = line.split('|')[0]
line = line.split(':')[-1]
#print(line)
如果你将你的标签作为一个列表传递到这个列表中,它首先会获取第一个 | 之前的所有内容。 (第一次拆分),然后在您将其拆分时采用最后一个字符串:(第二次拆分)。
我做了这个程序:
accepted_keys = ['Copy: Included: Yes: Include:','Copy:','Copy: Included: Yes:']
pd.options.display.max_colwidth = 200 #to keep larger string
def function(row):
row = row.to_string()
for k in accepted_keys:
if k in row:
r1 = row.split('|')[0]
r2 = r1.split(':')[-1]
return r2
dfd.apply(lambda row: function(row), axis=1) #dfd is the name of Dataframe
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.