[英]Making pandas dataframe from dict
I am working on an assignment where I have made an dict with Political partys as keys and the genders of the members of the political partys as items. 我正在做一项任务,我以政党为重点,以政党成员的性别为项目,做出了一项命令。
The dict is named: genderlist
. 该字典的名称为:
genderlist
。 the code for my dict as following: 我的字典的代码如下:
soup = BeautifulSoup(open(loadKandidatenlijst()).read(), features="xml")
genderlist = {}
for affiliation in soup.findAll('Affiliation'):
genders = []
party = affiliation.RegisteredName.text
genderlist[party] = 0
for name in affiliation.findAll('Candidate'):
gender = name.Gender.text
genders.append(gender)
genderlist[party] = genders
genderlist['Partij van de Arbeid (P.v.d.A.)'][:6], len(genderlist), len(genderlist['CDA'])
My output results in: (['male', 'female', 'male', 'female', 'male', 'female'], 24, 50)
我的输出结果是:
(['male', 'female', 'male', 'female', 'male', 'female'], 24, 50)
So, when I insert a partyname it results in the genders of all members in the party. 因此,当我插入一个聚会名称时,它会导致聚会中所有成员的性别。
Now I need to make a dataframe like this: 现在,我需要制作一个像这样的数据框:
So where it counts the genders seperatly and returns the femalepercentage in the dataframe. 因此,它分别计算性别并返回数据框中的女性百分比。
I've now tried this: 我现在已经尝试过了:
pd.DataFrame(genderlist.items(),columns=['male', 'female'])
How can I make a dataframe like expected, where the first 30 candidates of the party will be counted and result in a male and female separated dataframe with a percentage? 我如何才能像预期的那样制作一个数据框,在该数据框中将计算该党的前30名候选人,并导致一个男女分开的数据框有一定百分比?
Can you please help me out, what can I do with my code from now on. 您能帮我吗,从现在开始我该如何处理我的代码。
Thankyou in advance 先感谢您
You can use the list.count(element)
function along with python dictionary comprehension to first create a dictionary of gender_counts
which has the data you need and then use df.from_dict
to convert that into a dataframe 您可以将
list.count(element)
函数与python字典理解一起使用,以首先创建一个含有所需数据的gender_counts
字典,然后使用df.from_dict
将其转换为数据df.from_dict
#each list has gender of members of that party
party_A
['female', 'female', 'male', 'female', 'male', 'male', 'female', 'female',
'female', 'female']
gender_dict = {'Party_A': party_A, 'Party_B': party_B,
'Party_C': party_C, 'Party_D': party_D}
gender_counts = {k: [v.count('male'), v.count('female')] for k, v in gender_dict.items()}
gender_counts
{'Party_A': [3, 7],
'Party_B': [5, 9],
'Party_C': [13, 7],
'Party_D': [9, 6]}
df = pd.DataFrame.from_dict(gender_counts, orient='index', columns=['male', 'female'])
df
male female
Party_A 3 7
Party_B 5 9
Party_C 13 7
Party_D 9 6
df['Women_pecentage'] = df.female/(df.male+df.female)
df.round(2)
male female Women_Percentage
Party_A 3 7 0.70
Party_B 5 9 0.64
Party_C 13 7 0.35
Party_D 9 6 0.40
Let df
be your current output (I changed the column names): 令
df
为当前输出(我更改了列名):
df = pd.DataFrame(genderlist.items(), columns=['party_name', 'gender_list'])
gender_list
is now a column of lists in this format: gender_list
清单现在是这种格式的清单列:
['male', 'female', 'male', 'female', 'male', 'female']
Now you can just apply unique counts of elements using Counter
, which returns a dictionary and then use apply(pd.Series)
to split the column of dictionaries into separate columns. 现在,您可以使用
Counter
来应用元素的唯一计数, Counter
返回一个字典,然后使用apply(pd.Series)
将字典列拆分为单独的列。
from collections import Counter
df['gender_list'].apply(Counter).apply(pd.Series)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.