[英]convert list elements into list of tuples
header = ['chr', 'pos', 'ms01e_PI', 'ms01e_PG_al', 'ms02g_PI', 'ms02g_PG_al', 'ms03g_PI', 'ms03g_PG_al', 'ms04h_PI', 'ms04h_PG_al']
I want to convert the above list elements into list of tuples. 我想将上面的列表元素转换为元组列表。 Like: 喜欢:
sample_list = [('ms01e_PI', 'ms01e_PG_al'), ('ms02g_PI', 'ms02g_PG_al'),
'ms03g_PI', 'ms03g_PG_al'), ('ms04h_PI', 'ms04h_PG_al')]
I am thinking lambda or list comprehension can be used to approach this in a short and comprehensive way. 我认为lambda或list comprehension可以用来简短而全面地解决这个问题。
sample_list = [lambda (x,y): x = a if '_PI' in a for a in header ..]
or, 要么,
[(x, y) if '_PI' and '_PG_al' in a for a in header]
any suggestions? 有什么建议么?
You can filter the list and remove all elements that do not match the desired grouping pattern: 您可以筛选列表并删除与所需分组模式不匹配的所有元素:
import re
import itertools
header = ['chr', 'pos', 'ms01e', 'ms01e_PG_al', 'ms01e_PI', 'ms01e_PG_al', 'ms02g_PI', 'ms02g_PG_al', 'ms03g_PI', 'ms03g_PG_al', 'ms04h_PI', 'ms04h_PG_al']
new_headers = list(filter(lambda x:re.findall('^[a-zA-Z]+_[a-zA-Z]+|[a-zA-Z]+\d+[a-zA-Z]+', x), header))
final_data = [(new_headers[i], new_headers[i+1]) for i in range(0, len(new_headers), 2)]
Output: 输出:
[('ms01e', 'ms01e_PG_al'), ('ms01e_PI', 'ms01e_PG_al'), ('ms02g_PI', 'ms02g_PG_al'), ('ms03g_PI', 'ms03g_PG_al'), ('ms04h_PI', 'ms04h_PG_al')]
Try this: 尝试这个:
list = ['chr', 'pos', 'ms01e_PI', 'ms01e_PG_al', 'ms02g_PI', 'ms02g_PG_al', 'ms03g_PI', 'ms03g_PG_al', 'ms04h_PI', 'ms04h_PG_al']
def l_tuple(list):
list = filter(lambda x: "PI" in x or "PG" in x, list)
l = sorted(list, key=lambda x: len(x) and x[:4])
return [(l[i], l[i + 1]) for i in range(0, len(l), 2)]
print(l_tuple(list))
Output 产量
[('ms01e_PI', 'ms01e_PG_al'), ('ms02g_PI', 'ms02g_PG_al'), ('ms03g_PI', 'ms03g_PG_al'), ('ms04h_PI', 'ms04h_PG_al')]
This is one way: 这是一种方式:
# first, filter and sort
header = sorted(i for i in header if any(k in i for k in ('_PI', '_PG_al')))
# second, zip and order by suffix
header = [(x, y) if '_PI' in x else (y, x) for x, y in zip(header[::2], header[1::2])]
# [('ms01e_PI', 'ms01e_PG_al'),
# ('ms02g_PI', 'ms02g_PG_al'),
# ('ms03g_PI', 'ms03g_PG_al'),
# ('ms04h_PI', 'ms04h_PG_al')]
I had a concern where the input header
may not have sample (PI and PG values) as ordered/organized. 我担心输入header
可能没有订购/组织的样本(PI和PG值)。 I think it would be better to mine the sample names first and then later create the list of tuples
in following manner. 我认为首先挖掘样本名称然后以下面的方式创建list of tuples
会更好。
header = ['chr', 'pos', 'ms01e_PI', 'ms01e_PG_al', 'ms02g_PI', 'ms02g_PG_al', 'ms03g_PI', 'ms03g_PG_al', 'ms04h_PI', 'ms04h_PG_al']
''' Keep the names of all the samples, after removing chr, pos and
also remove the other suffixes after the underscore(_). '''
samples = [x.split('_')[0] for x in header if '_' in x]
''' Now, create the reduced list (basically a set). But, if order is of
interest it can be preserved using this method. '''
''' Create an empty set '''
seen = set()
sample_set = [x for x in samples02 if not (x in seen or seen.add(x))]
''' Now, create the tuples of list '''
sample_list = [((x + '_PI'), (x + '_PG_al')) for x in sample_set]
print('sample list: ', sample_list)
sample list: [('ms01e_PI', 'ms01e_PG_al'), ('ms02g_PI', 'ms02g_PG_al'), ('ms03g_PI', 'ms03g_PG_al'), ('ms04h_PI', 'ms04h_PG_al')]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.