![](/img/trans.png)
[英]How to make all lists in a list of lists the same length by adding to them
[英]How do you make lists the same length by adding in N/A into all of them?
我有一本字典:
gene_table_comparison = {
"index":[1,2,3,4,5],
"GeneID_1":["a","b","c","d","e"],
"Start_1":[100,200,300,400,500]
"Function_1":["Bruh","","Dude","","Seriously"],
"GeneID_2":[1,2,3],
"Start_2":["x",y","z"],
"Function_2":["Geez","","Deez"]
}
我想使用pd.DataFrame(gene_table_comparison)
将其转换为数据框。
它需要使每个列表的长度相同,并且我希望 N/A 位于每个列表的末尾,但是我该怎么做呢? 如果它们的长度不同/随机怎么办?
这不是最有效的代码,但我认为它很好地展示了如何解决问题的基础知识。
首先,您要找到最大列表长度:
max_length = 0
for col in gene_table_comparison.values():
if len(col) > max_length:
max_length = len(col)
接下来,您可以将 append nans 添加到列表中,直到它们的长度都相同:
import numpy as np
for col in gene_table_comparison.values():
for _ in range(max_length - len(col)):
col.append(np.nan)
综合起来:
import numpy as np
max_length = 0
for col in gene_table_comparison.values():
if len(col) > max_length:
max_length = len(col)
for col in gene_table_comparison.values():
for _ in range(max_length - len(col)):
col.append(np.nan)
这是一个选项,使用附加动态占位符的自定义 function:
gene_table_comparison = {
"index":[1,2,3,4,5],
"GeneID_1":["a","b","c","d","e"],
"Start_1":[100,200,300,400,500],
"Function_1": ["Bruh","","Dude","","Seriously"],
"GeneID_2":[1,2,3],
"Start_2":["x","y","z"],
"Function_2":["Geez","","Deez"]
}
def fill_with_placeholder(d, placeholder):
max_length = max(map(len, d.values()))
fill_list = lambda l: l + [placeholder for _ in range(max_length - len(l))]
return {
key: fill_list(sublist) if len(sublist) < max_length else sublist for key, sublist in d.items()
}
结果:
{
'index': [1, 2, 3, 4, 5], 'GeneID_1': ['a', 'b', 'c', 'd', 'e'],
'Start_1': [100, 200, 300, 400, 500], 'Function_1': ['Bruh', '',
'Dude', '', 'Seriously'], 'GeneID_2': [1, 2, 3, 'NA', 'NA'],
'Start_2': ['x', 'y', 'z', 'NA', 'NA'], 'Function_2': ['Geez', '',
'Deez', 'NA', 'NA']
}
这是一个有效的单线:
gene_table_comparison = {
"index":[1,2,3,4,5],
"GeneID_1":["a","b","c","d","e"],
"Start_1":[100,200,300,400,500],
"Function_1":["Bruh","","Dude","","Seriously"],
"GeneID_2":[1,2,3],
"Start_2":["x","y","z"],
"Function_2":["Geez","","Deez"]
}
dict_df = pd.DataFrame({ key:pd.Series(value) for key, value in gene_table_comparison.items() })
不确定 N/A 是什么意思,所以假设没有。
首先计算最长的列表(值)长度:
gene_table_comparison = {
"index":[1,2,3,4,5],
"GeneID_1":["a","b","c","d","e"],
"Start_1":[100,200,300,400,500],
"Function_1":["Bruh","","Dude","","Seriously"],
"GeneID_2":[1,2,3],
"Start_2":["x","y","z"],
"Function_2":["Geez","","Deez"]
}
max_ = max(map(len, gene_table_comparison.values()))
然后枚举值并根据需要填充它们:
for v in gene_table_comparison.values():
if (a := max_ - len(v)) > 0:
v.extend([None]*a)
print(gene_table_comparison)
Output:
{'index': [1, 2, 3, 4, 5], 'GeneID_1': ['a', 'b', 'c', 'd', 'e'], 'Start_1': [100, 200, 300, 400, 500], 'Function_1': ['Bruh', '', 'Dude', '', 'Seriously'], 'GeneID_2': [1, 2, 3, None, None], 'Start_2': ['x', 'y', 'z', None, None], 'Function_2': ['Geez', '', 'Deez', None, None]}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.