根据子字符串位置检索两个未知索引之间的列中的单元格字符串值

Question

I need to locate the first location where the word 'then' appears on Words table.我需要找到“then”一词出现在 Words 表上的第一个位置。 I'm trying to get a code to consolidate all strings on 'text' column from this location till the first text with a substring '666' or '999' in it (in this case a combination of their, stoma22, fe156, sligh334, pain666 (the desired subtrings_output = 'theirfe156sligh334pain666'). I've tried:我正在尝试获取一个代码，以从该位置合并“文本”列上的所有字符串，直到其中包含 substring“666”或“999”的第一个文本（在本例中是它们的组合，stoma22、fe156、sligh334 ，pain666（所需的 subtrings_output = 'theirfe156sligh334pain666'）。我试过：

their_loc = np.where(words['text'].str.contains(r'their', na =True))[0][0]
666_999_loc = np.where(words['text'].str.contains(r'666', na =True))[0][0]
subtrings_output = Words['text'].loc[Words.index[their_loc:666_999_loc]]

as you can see I'm not sure how to extend the conditioning of 666_999_loc to include substring 666 or 999, also slicing the indexing between two variables renders an error.如您所见，我不确定如何扩展 666_999_loc 的条件以包括 substring 666 或 999，同时在两个变量之间分割索引会导致错误。 Many thanks非常感谢

Words table:单词表：

page no页码	text文本	font字体
1 1个	they他们	0 0
1 1个	ate吃	0 0
1 1个	apples苹果	0 0
2 2个	and和	0 0
2 2个	then然后	1 1个
2 2个	their他们的	0 0
2 2个	stoma22造口22	0 0
2 2个	fe156铁156	1 1个
2 2个	sligh334略微334	0 0
2 2个	pain666疼痛666	1 1个
2 2个	given给予	0 0
2 2个	the这	1 1个
3 3个	fruit水果	0 0

Answer 1

You just need to add one for the end of the slice, and add an or condition to the np.where of the 666_or_999_loc using the |您只需要在切片末尾添加一个，并使用|添加一个or条件到np.where的666_or_999_loc operator.操作员。

text_col = words['text']

their_loc = np.where(text_col.str.contains(r'their', na=True))[0][0]

contains_666_or_999_loc = np.where(text_col.str.contains('666', na=True) |
                                   text_col.str.contains('999', na=True))[0][0]

subtrings_output = ''.join(text_col.loc[words.index[their_loc:contains_666_or_999_loc + 1]])

print(subtrings_output)

Output: Output：

theirstoma22fe156sligh334pain666

Answer 2

IIUC, use pandas.Series.idxmax with "".join() . IIUC，使用pandas.Series.idxmax和"".join() 。

Series.idxmax(axis=0, skipna=True, *args, **kwargs)
Return the row label of the maximum value .返回最大值的第 label 行。 If multiple values equal the maximum, the first row label with that value is returned.如果多个值等于最大值，则返回具有该值的第一行 label。

So, assuming ( Words ) is your dataframe, try this:因此，假设 ( Words ) 是您的 dataframe，试试这个：

their_loc = Words["text"].str.contains("their").idxmax()

_666_999_loc = Words["text"].str.contains("666").idxmax()

subtrings_output = "".join(Words["text"].loc[Words.index[their_loc:_666_999_loc+1]])

Output: Output：

print(subtrings_output)
#theirstoma22fe156sligh334pain666

#their stoma22 fe156 sligh334 pain666 # <- with " ".join()

根据子字符串位置检索两个未知索引之间的列中的单元格字符串值

问题描述

2 个解决方案

解决方案1
0 2023-01-29 23:16:50

解决方案2
0 2023-01-29 23:19:23

根据子字符串位置检索两个未知索引之间的列中的单元格字符串值

问题描述

2 个解决方案

解决方案1 0 2023-01-29 23:16:50

解决方案2 0 2023-01-29 23:19:23

解决方案1
0 2023-01-29 23:16:50

解决方案2
0 2023-01-29 23:19:23