简体   繁体   English

如何在字典中使用包含特定索引和列的列表的字典来创建Pandas DataFrame?

[英]How can I create Pandas DataFrame from a dict with lists with specific indexes and columns in Python3?

Assume now I have a dict with lists: 现在假设我有一个带有列表的字典:

dic = { "protein1": ["func1", "func2"],
        "protein2": ["func2", "func3", "func5"],
        "protein3": ["func3", "func5"]}

and the list of index: 和索引列表:

rows = ["protein1", "protein2", "protein3", "protein4"]

and the list of column: 和列的列表:

columns = ["func1", "func2", "func3", "func4", "func5", "func6"]

I want to convert dic to a Pandas DataFrame like 我想将dic转换为Pandas DataFrame

           func1    func2     func3    func4   func5    func6
protein1     1        1          0       0       0        0
protein2     0        1          1       0       1        0
protein3     0        0          1       0       1        0
protein4     0        0          0       0       0        0

What's the pythonic way to code this? python编写此代码的方式是什么? Thank you! 谢谢!

Use MultiLabelBinarizer with DataFrame.reindex : MultiLabelBinarizerDataFrame.reindex一起DataFrame.reindex

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
df = (pd.DataFrame(mlb.fit_transform(dic.values()),columns=mlb.classes_, index=dic.keys())
        .reindex(columns=columns, index=rows, fill_value=0))
print (df)
          func1  func2  func3  func4  func5  func6
protein1      1      1      0      0      0      0
protein2      0      1      1      0      1      0
protein3      0      0      1      0      1      0
protein4      0      0      0      0      0      0

Only pandas solution is possible, but slowier - use Series.str.get_dummies : 只能使用熊猫解决方案,但速度较慢-使用Series.str.get_dummies

df = (pd.Series({k:'|'.join(v) for k, v in dic.items()}).str.get_dummies()
        .reindex(columns=columns, index=rows, fill_value=0))

Another solution whose otput is a dataframe with boolean values (can be treated as integers) otput是具有布尔值的数据帧(可以视为整数)的另一种解决方案

import numpy as np 

dic = { "protein1": ["func1", "func2"], 
        "protein2": ["func2", "func3", "func5"], 
        "protein3": ["func3", "func5"]}  

columns = ["func1", "func2", "func3", "func4", "func5", "func6"]
n = len(columns)  

# index arrays by column values 
for key, value in dic.items(): 
      newRow = np.empty(n, dtype=bool) 
      np.put(newRow, [columns.index(i) for i in value], True) 
      dic[key] = newRow 

pd.DataFrame.from_dict(dic, orient='index', columns=columns)
# Out:
#           func1  func2  func3  func4  func5  func6
# protein1   True   True  False  False  False  False
# protein2  False   True   True  False   True  False
# protein3  False  False   True  False   True  False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何创建一个新列与 dataframe 中的列表与另一个 dataframe 的索引相匹配? - How can I create a new columns matching a column with lists in a dataframe with the indexes of another dataframe? 如何在 pandas (Python3) 中创建具有某些值的自定义 dataframe 作为列? - How to create a customized dataframe with certain values as columns in pandas (Python3)? 如何从 dataframe 中包含列表的列创建新列 - How can I create a new columns from a column with lists in a dataframe Pandas:从dict在DataFrame中创建命名列 - Pandas: create named columns in DataFrame from dict 如何从 .t​​xt 文件的特定列创建熊猫数据框? - How can I create a pandas dataframe from specific columns of a .txt file? 如何从“熊猫” DataFrame中“取消透视”特定列? - How can I “unpivot” specific columns from a pandas DataFrame? 如何从长度不等的列表中创建虚拟数据框? - How can I create a dataframe of dummies from a dict of lists of unequal length? 从具有 True 的列列表的字典创建熊猫数据框 - Creating a pandas dataframe from dict of lists of columns which have True 在 Python 中,如何从列表列表中创建 dataframe? - In Python, how can I create a dataframe from a list of lists? 如何比较两个列表与python中dict中的特定键 - How can I compare two lists with specific key in dict in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM