[英]Finding the nth character in a list of text
import re
text = "~SR1*abcde*1234*~end~SR*abcdef*123*~end~SR11*abc*12345*~end"
I have a text that is repetitive in nature.我有一个本质上是重复的文本。 It starts with '~SR' and ends with 'end'.它以“~SR”开头,以“end”结尾。 i want to find the index of the 1st, 2nd, and 3rd ' * ' (asterisk) from each repetition.我想从每次重复中找到第 1、第 2 和第 3 个“*”(星号)的索引。
def start_point(p1):
segment_start_array = []
for match in re.finditer(p1, text):
index = match.start()
segment_start_array.append(index)
return segment_start_array
def point_a(p1):
a = start_point(p1)
return a
def point_b(p2):
b = start_point(p2)
return b
def get_var_section(p1, p2):
var_list = []
for each in range(len(start_point(p1))):
list = text[point_a(p1)[each]:point_b(p2)[each]]
var_list.append(list)
return var_list
print(get_var_section('~SR', '~end'))
==> Result: ['~SR1*finda*1234*', '~SR*Findab*123*', '~SR11*findabc*12345*']
==> 结果: ['~SR1*finda*1234*', '~SR*Findab*123*', '~SR11*findabc*12345*']
What i did first is put the repetitions into a list, which resulted into three elements.我首先做的是将重复放入一个列表中,结果为三个元素。 By doing this I thought it would make it easier to find the position of each asterisk, but when i tried to find the index of the 1st and 2nd asterisk the result were the same.通过这样做,我认为可以更容易地找到每个星号的位置,但是当我试图找到第一个和第二个星号的索引时,结果是一样的。
def test(p1, p2, occurrence):
var_list4 = []
for i in get_var_section(p1, p2):
x = i.find('*', occurrence)
var_list4.append(x)
return var_list4
print(test('~SR', '~end', 1))
print(test('~SR', '~end', 2))
==> Result: [4, 3, 5]
==> 结果: [4, 3, 5]
==> Result: [4, 3, 5]
==> 结果: [4, 3, 5]
I don't understand why the result didn't change after i changed to find the position of the 2nd occurrence.我不明白为什么在我更改以找到第二次出现的位置后结果没有改变。
As you mentioned that the string starts and ends with (~SR1, ~end) , I split the string with ~end
and then used item
to loop through the list to find indexes in the item
.正如您提到的字符串以(~SR1, ~end)开头和结尾,我用~end
拆分字符串,然后使用item
循环遍历列表以查找item
索引。
import re
text = "~SR1*abcde*1234*~end~SR*abcdef*123*~end~SR11*abc*12345*~end"
text_list = text.split('~end')
index = []
for item in text_list:
#print(item)
if len(item) > 0:
ind = [i for i, val in enumerate(item) if val == '*']
#print(ind)
index.append(ind)
index_new = np.array(index).T.tolist() #transpose of list of lists
Result结果
print("index")
[[4, 10, 15], [3, 10, 14], [5, 9, 15]]
print("index_new")
[[4, 3, 5], [10, 10, 9], [15, 14, 15]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.