junctions = [2,9,15,20]
seq_1 = 'sauron'
seq_2 = 'corrupted'
seq_3 = 'numenor'
combined = 'sauroncorruptednumenor' #seq_1 + seq_2 + seq_3
count_1 = 1
count_2 = 1
count_3 = 2
I have a list of 3 strings (seq_1-3). I combine them to create 1 long string (combined) I have a list of indices (junctions). I have 3 different counters set to zero for each string (count_1-3)
What I am trying to do is find the position of each junction [2,9,15,20] in the combined sequence . . . if it is from seq_1 --> count_1 += 1, if it is from seq_2 --> count_2 += 1, from seq_3 --> count_3 += 1
example
junctions = [2,9,15,20]
count_1 = 0
count_2 = 0
count_3 = 0
combined = 'sauroncorruptednumenor'
seq_1 = 'sauron' #index 2 would be on 'u' in combined but originally from seq_1 so count_1 = count_1 + 1
seq_2 = 'corrupted' #index 9 would be on 'r' in combined so count_2 += 1
seq_3 = 'numenor' #index 15 would be 'n' in combined so count_3 += 1, and 20 would be 'o' so count_3 += 1
let me know if i need to clarify any differently
You can use collections.Counter
and bisect.bisect_left
here:
>>> from collections import Counter
>>> import bisect
>>> junctions = [2,9,15,20]
>>> seq_1 = 'sauron'
>>> seq_2 = 'corrupted'
>>> seq_3 = 'numenor'
>>> lis = [seq_1, seq_2, seq_3]
Create a list containing the indexes at which at each seq_
ends:
>>> start = -1
>>> break_points = []
for item in lis:
start += len(item)
break_points.append(start)
...
>>> break_points
[5, 14, 21]
Now we can simply loop over junctions
and find each junction's position in the break_points
list using bisect.bisect_left
function.
>>> Counter(bisect.bisect_left(break_points, jun)+1 for jun in junctions)
Counter({3: 2, 1: 1, 2: 1})
Better output using collections.defaultdict
:
>>> from collections import defaultdict
>>> dic = defaultdict(int)
for junc in junctions:
ind = bisect.bisect_left(break_points, junc) +1
dic['count_'+str(ind)] += 1
...
>>> dic
defaultdict(<type 'int'>,
{'count_3': 2,
'count_2': 1,
'count_1': 1})
#accessing these counts
>>> dic['count_3']
2
You could try something basic like
L_1 = len(seq_1)
L_2 = len(seq_2)
L_3 = len(seq_3)
junctions = [2, 9, 15, 20]
c_1, c_2, c_3 = (0, 0, 0)
for j in junctions:
if j < L_1:
c_1 += 1
elif j < L_1 + L_2:
c_2 += 1
elif j < L_1 + L_2 + L_3:
c_3 += 1
else:
Raise error
Could use collections.Counter
, and repeat
and chain
from itertools, eg:
from itertools import chain, repeat
from operator import itemgetter
from collections import Counter
junctions = [2,9,15,20]
seq_1 = 'sauron'
seq_2 = 'corrupted'
seq_3 = 'numenor'
indices = list(chain.from_iterable(repeat(i, len(j)) for i, j in enumerate([seq_1, seq_2, seq_3], start=1)))
print Counter(itemgetter(*junctions)(indices))
# Counter({3: 2, 1: 1, 2: 1})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.