简体   繁体   English

查找文本列表中的第 n 个字符

[英]Finding the nth character in a list of text

    import re

    text = "~SR1*abcde*1234*~end~SR*abcdef*123*~end~SR11*abc*12345*~end"

I have a text that is repetitive in nature.我有一个本质上是重复的文本。 It starts with '~SR' and ends with 'end'.它以“~SR”开头,以“end”结尾。 i want to find the index of the 1st, 2nd, and 3rd ' * ' (asterisk) from each repetition.我想从每次重复中找到第 1、第 2 和第 3 个“*”(星号)的索引。

    def start_point(p1):
        segment_start_array = []
        for match in re.finditer(p1, text):
            index = match.start()
            segment_start_array.append(index)
        return segment_start_array


    def point_a(p1):
        a = start_point(p1)
        return a


    def point_b(p2):
        b = start_point(p2)
        return b


    def get_var_section(p1, p2):
        var_list = []
        for each in range(len(start_point(p1))):
            list = text[point_a(p1)[each]:point_b(p2)[each]]
            var_list.append(list)
        return var_list


    print(get_var_section('~SR', '~end'))

==> Result: ['~SR1*finda*1234*', '~SR*Findab*123*', '~SR11*findabc*12345*'] ==> 结果: ['~SR1*finda*1234*', '~SR*Findab*123*', '~SR11*findabc*12345*']

What i did first is put the repetitions into a list, which resulted into three elements.我首先做的是将重复放入一个列表中,结果为三个元素。 By doing this I thought it would make it easier to find the position of each asterisk, but when i tried to find the index of the 1st and 2nd asterisk the result were the same.通过这样做,我认为可以更容易地找到每个星号的位置,但是当我试图找到第一个和第二个星号的索引时,结果是一样的。

    def test(p1, p2, occurrence):
        var_list4 = []
        for i in get_var_section(p1, p2):
            x = i.find('*', occurrence)
            var_list4.append(x)
        return var_list4


    print(test('~SR', '~end', 1))
    print(test('~SR', '~end', 2))

==> Result: [4, 3, 5] ==> 结果: [4, 3, 5]
==> Result: [4, 3, 5] ==> 结果: [4, 3, 5]
I don't understand why the result didn't change after i changed to find the position of the 2nd occurrence.我不明白为什么在我更改以找到第二次出现的位置后结果没有改变。

As you mentioned that the string starts and ends with (~SR1, ~end) , I split the string with ~end and then used item to loop through the list to find indexes in the item .正如您提到的字符串以(~SR1, ~end)开头和结尾,我用~end拆分字符串,然后使用item循环遍历列表以查找item索引。

import re

text = "~SR1*abcde*1234*~end~SR*abcdef*123*~end~SR11*abc*12345*~end"
text_list = text.split('~end')
index = []
for item in text_list:
    #print(item)
    if len(item) > 0:
        ind = [i for i, val in enumerate(item) if val == '*']
        #print(ind)
        index.append(ind)
index_new = np.array(index).T.tolist() #transpose of list of lists

Result结果

print("index") 

[[4, 10, 15], [3, 10, 14], [5, 9, 15]]

print("index_new") 

[[4, 3, 5], [10, 10, 9], [15, 14, 15]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM