如何从另一个df的列的唯一值创建multiIndex（分层索引）dataframe object？

Question

I'm trying to create a pandas multiIndexed dataframe that is a summary of the unique values in each column.我正在尝试创建一个 pandas multiIndexed dataframe ，它是每列中唯一值的摘要。

Is there an easier way to have this information summarized besides creating this dataframe?除了创建这个 dataframe 之外，还有没有更简单的方法来总结这些信息？

Either way, it would be nice to know how to complete this code challenge.无论哪种方式，很高兴知道如何完成这个代码挑战。 Thanks for your help.谢谢你的帮助。 Here is the toy dataframe and the solution I attempted using a for loop with a dictionary and a value_counts dataframe.这是玩具 dataframe 和我尝试使用带有字典和 value_counts Z6A8064B5DF47C55057DZ 的 for 循环的解决方案。 Not sure if it's possible to incorporate MultiIndex.from_frame or.from_product here somehow...不确定是否可以在此处以某种方式合并 MultiIndex.from_frame 或 .from_product ...

Original Dataframe:原装 Dataframe：

data = pd.DataFrame({'A': ['case', 'case', 'case', 'case', 'case'], 
                     'B': [2001, 2002, 2003, 2004, 2005], 
                     'C': ['F', 'M', 'F', 'F', 'M'],
                     'D': [0, 0, 0, 1, 0],
                     'E': [1, 0, 1, 0, 1],
                     'F': [1, 1, 0, 0, 0]})


    A       B       C   D   E   F
0   case    2001    F   0   1   1
1   case    2002    M   0   0   1
2   case    2003    F   0   1   0
3   case    2004    F   1   0   0
4   case    2005    M   1   1   0

Desired outcome:期望的结果：

     unique  percent
A    case    100 
B    2001    20
     2002    20
     2003    20
     2004    20
     2005    20
C    F       60
     M       40
D    0       80
     1       20
E    0       40
     1       60
F    0       60
     1       40

My failed for loop attempt:我的 for 循环尝试失败：

def unique_values(df):
    values = {}
    columns = []
    df = pd.DataFrame(values, columns=columns)
    for col in data:
        df2 = data[col].value_counts(normalize=True)*100
        values = values.update(df2.to_dict)
        columns = columns.append(col*len(df2))
    return df

unique_values(data)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-84-a341284fb859> in <module>
     11 
     12 
---> 13 unique_values(data)

<ipython-input-84-a341284fb859> in unique_values(df)
      5     for col in data:
      6         df2 = data[col].value_counts(normalize=True)*100
----> 7         values = values.update(df2.to_dict)
      8         columns = columns.append(col*len(df2))
      9     return df

TypeError: 'method' object is not iterable

Let me know if there's something obvious I'm missing, Still relatively new to EDA and pandas.让我知道我是否遗漏了一些明显的东西，对于 EDA 和 pandas 来说仍然相对较新。 any pointers appreciated.任何指针表示赞赏。

Answer 1

This is a fairly straightforward application of .melt :这是.melt的一个相当简单的应用：

data.melt().reset_index().groupby(['variable', 'value']).count()/len(data)

output output

                index
variable value  
A        case   1.0
B        2001   0.2
         2002   0.2
         2003   0.2
         2004   0.2
         2005   0.2
C        F      0.6
         M      0.4
D        0      0.8
         1      0.2
E        0      0.4
         1      0.6
F        0      0.6
         1      0.4

Answer 2

I'm sorry, I've written an answer.对不起，我已经写了一个答案。 but it's in javascript, I came here after I thought I've clicked on javascript and started coding.但它在 javascript 中，我想我已经点击了 javascript 并开始编码后才来到这里。 but on posting I saw that you're coding in python.但是在发布时，我看到您正在使用 python 进行编码。

I will post it anyway, maybe it will help you.无论如何我都会发布它，也许它会对你有所帮助。 Python is not that much different from javascript;-) Python 与 javascript 差别不大；-)

const data = {
    A: ["case", "case", "case", "case", "case"],
    B: [2001, 2002, 2003, 2004, 2005],
    C: ["F", "M", "F", "F", "M"],
    D: [0, 0, 0, 1, 0],
    E: [1, 0, 1, 0, 1],
    F: [1, 1, 0, 0, 0]
};

const getUniqueStats = (_data) => {
    const results = [];
    for (let row in _data) {
        // create list of unique values
        const s = [...new Set(_data[row])]; 
        // filter for unique values and count them for percentage, then push
        results.push({ index: row, values: s.map((x) => ({ unique: x, percentage: (_data[row].filter((y) => y === x).length / data[row].length) * 100 })) });
    }
    return results;
};

const results = getUniqueStats(data);

results.forEach((row) =>
    row.values.forEach((value) =>
        console.log(`${row.index}\t${value.unique}\t${value.percentage}%`)
    )
);

如何从另一个df的列的唯一值创建multiIndex（分层索引）dataframe object？

问题描述

2 个解决方案

解决方案1
0 已采纳 2022-01-19 21:37:10

解决方案2
0 2022-01-19 22:34:08

如何从另一个df的列的唯一值创建multiIndex（分层索引）dataframe object？

问题描述

2 个解决方案

解决方案1 0 已采纳 2022-01-19 21:37:10

解决方案2 0 2022-01-19 22:34:08

解决方案1
0 已采纳 2022-01-19 21:37:10

解决方案2
0 2022-01-19 22:34:08