简体   繁体   English

如何从另一个df的列的唯一值创建multiIndex(分层索引)dataframe object?

[英]How to create a multiIndex (hierarchical index) dataframe object from another df's column's unique values?

I'm trying to create a pandas multiIndexed dataframe that is a summary of the unique values in each column.我正在尝试创建一个 pandas multiIndexed dataframe ,它是每列中唯一值的摘要。

Is there an easier way to have this information summarized besides creating this dataframe?除了创建这个 dataframe 之外,还有没有更简单的方法来总结这些信息?

Either way, it would be nice to know how to complete this code challenge.无论哪种方式,很高兴知道如何完成这个代码挑战。 Thanks for your help.谢谢你的帮助。 Here is the toy dataframe and the solution I attempted using a for loop with a dictionary and a value_counts dataframe.这是玩具 dataframe 和我尝试使用带有字典和 value_counts Z6A8064B5DF47C55057DZ 的 for 循环的解决方案。 Not sure if it's possible to incorporate MultiIndex.from_frame or.from_product here somehow...不确定是否可以在此处以某种方式合并 MultiIndex.from_frame 或 .from_product ...

Original Dataframe:原装 Dataframe:

data = pd.DataFrame({'A': ['case', 'case', 'case', 'case', 'case'], 
                     'B': [2001, 2002, 2003, 2004, 2005], 
                     'C': ['F', 'M', 'F', 'F', 'M'],
                     'D': [0, 0, 0, 1, 0],
                     'E': [1, 0, 1, 0, 1],
                     'F': [1, 1, 0, 0, 0]})


    A       B       C   D   E   F
0   case    2001    F   0   1   1
1   case    2002    M   0   0   1
2   case    2003    F   0   1   0
3   case    2004    F   1   0   0
4   case    2005    M   1   1   0

Desired outcome:期望的结果:

     unique  percent
A    case    100 
B    2001    20
     2002    20
     2003    20
     2004    20
     2005    20
C    F       60
     M       40
D    0       80
     1       20
E    0       40
     1       60
F    0       60
     1       40

My failed for loop attempt:我的 for 循环尝试失败:

def unique_values(df):
    values = {}
    columns = []
    df = pd.DataFrame(values, columns=columns)
    for col in data:
        df2 = data[col].value_counts(normalize=True)*100
        values = values.update(df2.to_dict)
        columns = columns.append(col*len(df2))
    return df

unique_values(data)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-84-a341284fb859> in <module>
     11 
     12 
---> 13 unique_values(data)

<ipython-input-84-a341284fb859> in unique_values(df)
      5     for col in data:
      6         df2 = data[col].value_counts(normalize=True)*100
----> 7         values = values.update(df2.to_dict)
      8         columns = columns.append(col*len(df2))
      9     return df

TypeError: 'method' object is not iterable

Let me know if there's something obvious I'm missing, Still relatively new to EDA and pandas.让我知道我是否遗漏了一些明显的东西,对于 EDA 和 pandas 来说仍然相对较新。 any pointers appreciated.任何指针表示赞赏。

This is a fairly straightforward application of .melt :这是.melt的一个相当简单的应用:

data.melt().reset_index().groupby(['variable', 'value']).count()/len(data)

output output

                index
variable value  
A        case   1.0
B        2001   0.2
         2002   0.2
         2003   0.2
         2004   0.2
         2005   0.2
C        F      0.6
         M      0.4
D        0      0.8
         1      0.2
E        0      0.4
         1      0.6
F        0      0.6
         1      0.4

I'm sorry, I've written an answer.对不起,我已经写了一个答案。 but it's in javascript, I came here after I thought I've clicked on javascript and started coding.但它在 javascript 中,我想我已经点击了 javascript 并开始编码后才来到这里。 but on posting I saw that you're coding in python.但是在发布时,我看到您正在使用 python 进行编码。

I will post it anyway, maybe it will help you.无论如何我都会发布它,也许它会对你有所帮助。 Python is not that much different from javascript;-) Python 与 javascript 差别不大;-)

const data = {
    A: ["case", "case", "case", "case", "case"],
    B: [2001, 2002, 2003, 2004, 2005],
    C: ["F", "M", "F", "F", "M"],
    D: [0, 0, 0, 1, 0],
    E: [1, 0, 1, 0, 1],
    F: [1, 1, 0, 0, 0]
};

const getUniqueStats = (_data) => {
    const results = [];
    for (let row in _data) {
        // create list of unique values
        const s = [...new Set(_data[row])]; 
        // filter for unique values and count them for percentage, then push
        results.push({ index: row, values: s.map((x) => ({ unique: x, percentage: (_data[row].filter((y) => y === x).length / data[row].length) * 100 })) });
    }
    return results;
};

const results = getUniqueStats(data);

results.forEach((row) =>
    row.values.forEach((value) =>
        console.log(`${row.index}\t${value.unique}\t${value.percentage}%`)
    )
);

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 给定MultiIndex对象如何更改熊猫DataFrame层次结构索引的值 - How to change values of a pandas DataFrame hierarchical index given a MultiIndex object 如何将 DF1 中的索引值获取到 DF1 的列值与 DF2 的自定义多索引值匹配的位置? - How can I get the index values in DF1 to where DF1's column values match DF2's custom multiindex values? 如何根据多索引熊猫数据框中的行索引值创建列? - How to create a column depending on the row index values in a multiindex pandas dataframe? 如何基于一列中的唯一值将df拆分为较小的df,然后将每个df旋转到另一列上 - How to split a df into smaller df's based on the unique values in one column, then pivot each df over another column 从 MultiIndex 中的索引列获取唯一值 - Get unique values from index column in MultiIndex 如何根据另一列的值从数据框中返回唯一对? - How to return unique pairs from a dataframe based on another column's values? 从一个数据框的列中获取唯一值,并使用它来过滤另一数据框中的行 - Get unique values from one dataframe's column and use this to filter rows in another dataframe 如何使用另一个数据框的MultiIndex过滤一个数据框的列 - How to filter one dataframe's columns with another dataframe's MultiIndex 根据另一个数据框的索引更新列值 - Update column values based on another dataframe's index MultiIndex DataFrame:如何基于其他列中的值创建新列? - MultiIndex DataFrame: How to create a new column based on values in other column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM