[英]How to create a multiIndex (hierarchical index) dataframe object from another df's column's unique values?
I'm trying to create a pandas multiIndexed dataframe that is a summary of the unique values in each column.我正在尝试创建一个 pandas multiIndexed dataframe ,它是每列中唯一值的摘要。
Is there an easier way to have this information summarized besides creating this dataframe?除了创建这个 dataframe 之外,还有没有更简单的方法来总结这些信息?
Either way, it would be nice to know how to complete this code challenge.无论哪种方式,很高兴知道如何完成这个代码挑战。 Thanks for your help.谢谢你的帮助。 Here is the toy dataframe and the solution I attempted using a for loop with a dictionary and a value_counts dataframe.这是玩具 dataframe 和我尝试使用带有字典和 value_counts Z6A8064B5DF47C55057DZ 的 for 循环的解决方案。 Not sure if it's possible to incorporate MultiIndex.from_frame or.from_product here somehow...不确定是否可以在此处以某种方式合并 MultiIndex.from_frame 或 .from_product ...
Original Dataframe:原装 Dataframe:
data = pd.DataFrame({'A': ['case', 'case', 'case', 'case', 'case'],
'B': [2001, 2002, 2003, 2004, 2005],
'C': ['F', 'M', 'F', 'F', 'M'],
'D': [0, 0, 0, 1, 0],
'E': [1, 0, 1, 0, 1],
'F': [1, 1, 0, 0, 0]})
A B C D E F
0 case 2001 F 0 1 1
1 case 2002 M 0 0 1
2 case 2003 F 0 1 0
3 case 2004 F 1 0 0
4 case 2005 M 1 1 0
Desired outcome:期望的结果:
unique percent
A case 100
B 2001 20
2002 20
2003 20
2004 20
2005 20
C F 60
M 40
D 0 80
1 20
E 0 40
1 60
F 0 60
1 40
My failed for loop attempt:我的 for 循环尝试失败:
def unique_values(df):
values = {}
columns = []
df = pd.DataFrame(values, columns=columns)
for col in data:
df2 = data[col].value_counts(normalize=True)*100
values = values.update(df2.to_dict)
columns = columns.append(col*len(df2))
return df
unique_values(data)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-84-a341284fb859> in <module>
11
12
---> 13 unique_values(data)
<ipython-input-84-a341284fb859> in unique_values(df)
5 for col in data:
6 df2 = data[col].value_counts(normalize=True)*100
----> 7 values = values.update(df2.to_dict)
8 columns = columns.append(col*len(df2))
9 return df
TypeError: 'method' object is not iterable
Let me know if there's something obvious I'm missing, Still relatively new to EDA and pandas.让我知道我是否遗漏了一些明显的东西,对于 EDA 和 pandas 来说仍然相对较新。 any pointers appreciated.任何指针表示赞赏。
This is a fairly straightforward application of .melt
:这是.melt
的一个相当简单的应用:
data.melt().reset_index().groupby(['variable', 'value']).count()/len(data)
output output
index
variable value
A case 1.0
B 2001 0.2
2002 0.2
2003 0.2
2004 0.2
2005 0.2
C F 0.6
M 0.4
D 0 0.8
1 0.2
E 0 0.4
1 0.6
F 0 0.6
1 0.4
I'm sorry, I've written an answer.对不起,我已经写了一个答案。 but it's in javascript, I came here after I thought I've clicked on javascript and started coding.但它在 javascript 中,我想我已经点击了 javascript 并开始编码后才来到这里。 but on posting I saw that you're coding in python.但是在发布时,我看到您正在使用 python 进行编码。
I will post it anyway, maybe it will help you.无论如何我都会发布它,也许它会对你有所帮助。 Python is not that much different from javascript;-) Python 与 javascript 差别不大;-)
const data = {
A: ["case", "case", "case", "case", "case"],
B: [2001, 2002, 2003, 2004, 2005],
C: ["F", "M", "F", "F", "M"],
D: [0, 0, 0, 1, 0],
E: [1, 0, 1, 0, 1],
F: [1, 1, 0, 0, 0]
};
const getUniqueStats = (_data) => {
const results = [];
for (let row in _data) {
// create list of unique values
const s = [...new Set(_data[row])];
// filter for unique values and count them for percentage, then push
results.push({ index: row, values: s.map((x) => ({ unique: x, percentage: (_data[row].filter((y) => y === x).length / data[row].length) * 100 })) });
}
return results;
};
const results = getUniqueStats(data);
results.forEach((row) =>
row.values.forEach((value) =>
console.log(`${row.index}\t${value.unique}\t${value.percentage}%`)
)
);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.