您如何通过在所有列表中添加 N/A 来使列表长度相同？

Question

我有一本字典：

gene_table_comparison = {
    "index":[1,2,3,4,5],
    "GeneID_1":["a","b","c","d","e"],
    "Start_1":[100,200,300,400,500]
    "Function_1":["Bruh","","Dude","","Seriously"],
    "GeneID_2":[1,2,3],
    "Start_2":["x",y","z"],
    "Function_2":["Geez","","Deez"]
}

我想使用pd.DataFrame(gene_table_comparison)将其转换为数据框。

它需要使每个列表的长度相同，并且我希望 N/A 位于每个列表的末尾，但是我该怎么做呢？ 如果它们的长度不同/随机怎么办？

Answer 1

这不是最有效的代码，但我认为它很好地展示了如何解决问题的基础知识。

首先，您要找到最大列表长度：

max_length = 0
for col in gene_table_comparison.values():
    if len(col) > max_length:
        max_length = len(col)

接下来，您可以将 append nans 添加到列表中，直到它们的长度都相同：

import numpy as np

for col in gene_table_comparison.values():
    for _ in range(max_length - len(col)):
        col.append(np.nan)

综合起来：

import numpy as np

max_length = 0
for col in gene_table_comparison.values():
    if len(col) > max_length:
        max_length = len(col)

for col in gene_table_comparison.values():
    for _ in range(max_length - len(col)):
        col.append(np.nan)

Answer 2

这是一个选项，使用附加动态占位符的自定义 function：

gene_table_comparison = {
    "index":[1,2,3,4,5],
    "GeneID_1":["a","b","c","d","e"],
    "Start_1":[100,200,300,400,500],
    "Function_1": ["Bruh","","Dude","","Seriously"],
    "GeneID_2":[1,2,3],
    "Start_2":["x","y","z"],
    "Function_2":["Geez","","Deez"]
}

def fill_with_placeholder(d, placeholder):
    max_length = max(map(len, d.values()))
    fill_list = lambda l: l + [placeholder for _ in range(max_length - len(l))]
    return {
        key: fill_list(sublist) if len(sublist) < max_length else sublist for key, sublist in d.items()
    }

结果：

{
'index': [1, 2, 3, 4, 5], 'GeneID_1': ['a', 'b', 'c', 'd', 'e'],
'Start_1': [100, 200, 300, 400, 500], 'Function_1': ['Bruh', '',
'Dude', '', 'Seriously'], 'GeneID_2': [1, 2, 3, 'NA', 'NA'],
'Start_2': ['x', 'y', 'z', 'NA', 'NA'], 'Function_2': ['Geez', '',
'Deez', 'NA', 'NA']
}

Answer 3

这是一个有效的单线：

gene_table_comparison = {
    "index":[1,2,3,4,5],
    "GeneID_1":["a","b","c","d","e"],
    "Start_1":[100,200,300,400,500],
    "Function_1":["Bruh","","Dude","","Seriously"],
    "GeneID_2":[1,2,3],
    "Start_2":["x","y","z"],
    "Function_2":["Geez","","Deez"]
}

dict_df = pd.DataFrame({ key:pd.Series(value) for key, value in gene_table_comparison.items() })

Answer 4

不确定 N/A 是什么意思，所以假设没有。

首先计算最长的列表（值）长度：

gene_table_comparison = {
    "index":[1,2,3,4,5],
    "GeneID_1":["a","b","c","d","e"],
    "Start_1":[100,200,300,400,500],
    "Function_1":["Bruh","","Dude","","Seriously"],
    "GeneID_2":[1,2,3],
    "Start_2":["x","y","z"],
    "Function_2":["Geez","","Deez"]
}

max_ = max(map(len, gene_table_comparison.values()))

然后枚举值并根据需要填充它们：

for v in gene_table_comparison.values():
    if (a := max_ - len(v)) > 0:
        v.extend([None]*a)

print(gene_table_comparison)

Output：

{'index': [1, 2, 3, 4, 5], 'GeneID_1': ['a', 'b', 'c', 'd', 'e'], 'Start_1': [100, 200, 300, 400, 500], 'Function_1': ['Bruh', '', 'Dude', '', 'Seriously'], 'GeneID_2': [1, 2, 3, None, None], 'Start_2': ['x', 'y', 'z', None, None], 'Function_2': ['Geez', '', 'Deez', None, None]}

您如何通过在所有列表中添加 N/A 来使列表长度相同？

问题描述

4 个解决方案

解决方案1
0 2022-09-04 17:11:40

解决方案2
0 2022-09-04 17:13:27

解决方案3
0 2022-09-04 17:13:38

解决方案4
0 2022-09-04 17:15:21

您如何通过在所有列表中添加 N/A 来使列表长度相同？

问题描述

4 个解决方案

解决方案1 0 2022-09-04 17:11:40

解决方案2 0 2022-09-04 17:13:27

解决方案3 0 2022-09-04 17:13:38

解决方案4 0 2022-09-04 17:15:21

解决方案1
0 2022-09-04 17:11:40

解决方案2
0 2022-09-04 17:13:27

解决方案3
0 2022-09-04 17:13:38

解决方案4
0 2022-09-04 17:15:21