繁体   English   中英

您如何通过在所有列表中添加 N/A 来使列表长度相同?

[英]How do you make lists the same length by adding in N/A into all of them?

我有一本字典:

gene_table_comparison = {
    "index":[1,2,3,4,5],
    "GeneID_1":["a","b","c","d","e"],
    "Start_1":[100,200,300,400,500]
    "Function_1":["Bruh","","Dude","","Seriously"],
    "GeneID_2":[1,2,3],
    "Start_2":["x",y","z"],
    "Function_2":["Geez","","Deez"]
}

我想使用pd.DataFrame(gene_table_comparison)将其转换为数据框。

它需要使每个列表的长度相同,并且我希望 N/A 位于每个列表的末尾,但是我该怎么做呢? 如果它们的长度不同/随机怎么办?

这不是最有效的代码,但我认为它很好地展示了如何解决问题的基础知识。

首先,您要找到最大列表长度:

max_length = 0
for col in gene_table_comparison.values():
    if len(col) > max_length:
        max_length = len(col)

接下来,您可以将 append nans 添加到列表中,直到它们的长度都相同:

import numpy as np

for col in gene_table_comparison.values():
    for _ in range(max_length - len(col)):
        col.append(np.nan)

综合起来:

import numpy as np

max_length = 0
for col in gene_table_comparison.values():
    if len(col) > max_length:
        max_length = len(col)

for col in gene_table_comparison.values():
    for _ in range(max_length - len(col)):
        col.append(np.nan)

这是一个选项,使用附加动态占位符的自定义 function:

gene_table_comparison = {
    "index":[1,2,3,4,5],
    "GeneID_1":["a","b","c","d","e"],
    "Start_1":[100,200,300,400,500],
    "Function_1": ["Bruh","","Dude","","Seriously"],
    "GeneID_2":[1,2,3],
    "Start_2":["x","y","z"],
    "Function_2":["Geez","","Deez"]
}

def fill_with_placeholder(d, placeholder):
    max_length = max(map(len, d.values()))
    fill_list = lambda l: l + [placeholder for _ in range(max_length - len(l))]
    return {
        key: fill_list(sublist) if len(sublist) < max_length else sublist for key, sublist in d.items()
    }

结果:

{
'index': [1, 2, 3, 4, 5], 'GeneID_1': ['a', 'b', 'c', 'd', 'e'],
'Start_1': [100, 200, 300, 400, 500], 'Function_1': ['Bruh', '',
'Dude', '', 'Seriously'], 'GeneID_2': [1, 2, 3, 'NA', 'NA'],
'Start_2': ['x', 'y', 'z', 'NA', 'NA'], 'Function_2': ['Geez', '',
'Deez', 'NA', 'NA']
}

这是一个有效的单线:

gene_table_comparison = {
    "index":[1,2,3,4,5],
    "GeneID_1":["a","b","c","d","e"],
    "Start_1":[100,200,300,400,500],
    "Function_1":["Bruh","","Dude","","Seriously"],
    "GeneID_2":[1,2,3],
    "Start_2":["x","y","z"],
    "Function_2":["Geez","","Deez"]
}

dict_df = pd.DataFrame({ key:pd.Series(value) for key, value in gene_table_comparison.items() })

不确定 N/A 是什么意思,所以假设没有。

首先计算最长的列表(值)长度:

gene_table_comparison = {
    "index":[1,2,3,4,5],
    "GeneID_1":["a","b","c","d","e"],
    "Start_1":[100,200,300,400,500],
    "Function_1":["Bruh","","Dude","","Seriously"],
    "GeneID_2":[1,2,3],
    "Start_2":["x","y","z"],
    "Function_2":["Geez","","Deez"]
}

max_ = max(map(len, gene_table_comparison.values()))

然后枚举值并根据需要填充它们:

for v in gene_table_comparison.values():
    if (a := max_ - len(v)) > 0:
        v.extend([None]*a)

print(gene_table_comparison)

Output:

{'index': [1, 2, 3, 4, 5], 'GeneID_1': ['a', 'b', 'c', 'd', 'e'], 'Start_1': [100, 200, 300, 400, 500], 'Function_1': ['Bruh', '', 'Dude', '', 'Seriously'], 'GeneID_2': [1, 2, 3, None, None], 'Start_2': ['x', 'y', 'z', None, None], 'Function_2': ['Geez', '', 'Deez', None, None]}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM