I have the following nested list:
original = [['B_S', 'O', 'O', 'O'],
['O', 'O', 'O', 'O', 'O', 'O', 'B_S', 'B_S', 'O', 'O'],
['O', 'B_S', 'O', 'O', 'B_S', 'B_S', 'B_S', 'O']]
There are only three kind of elements in the original list, ie, B_S
, I_S
,and O
. I want to change the elements based on aa specific condition: If an elements starts with B-prefix (ie, B_S) the following element should be changed to start with I-prefix if it had B-prefix . The desired output in this case is:
desired = [['B_S', 'O', 'O', 'O'],
['O', 'O', 'O', 'O', 'O', 'O', 'B_S', 'I_S', 'O', 'O'],
['O', 'B_S', 'O', 'O', 'B_S', 'I_S', 'B_S', 'O']]
It worked with this solution:
for ls in original:
for i in range(0,len(ls)):
if ls[i] == 'B_S' and ls[i+1] == 'B_S':
ls[i+1] = 'I_S'
But it takes a long time with a large dataset... is there any way to improve the code performance?
You might want to look into multiprocessing:
from multiprocessing import Pool
import os
original = [['B_S', 'O', 'O', 'O'],
['O', 'O', 'O', 'O', 'O', 'O', 'B_S', 'B_S', 'O', 'O'],
['O', 'B_S', 'O', 'O', 'B_S', 'B_S', 'B_S', 'O']]
def change(sub_list):
len_ = len(sub_list)
cnt = 0
while cnt < len_:
if sub_list[cnt] == 'B_S' and sub_list[cnt+1] == 'B_S':
sub_list[cnt+1] = 'I_S'
cnt += 1
return sub_list
if __name__ == '__main__':
results = []
for result in Pool(processes=os.cpu_count()).map(change, original[:]):
results.append(result)
print(results)
This will just spilt your original list into sublists, and treat them each individually before combining them back together.
This can for sure be further improved, as other comments already suggested.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.