简体   繁体   中英

Making pandas dataframe from dict

I am working on an assignment where I have made an dict with Political partys as keys and the genders of the members of the political partys as items.

The dict is named: genderlist . the code for my dict as following:

soup = BeautifulSoup(open(loadKandidatenlijst()).read(), features="xml")

genderlist = {}

for affiliation in soup.findAll('Affiliation'):
    genders = []
    party = affiliation.RegisteredName.text
    genderlist[party] = 0
    for name in affiliation.findAll('Candidate'):
        gender = name.Gender.text
        genders.append(gender)
        genderlist[party] = genders

genderlist['Partij van de Arbeid (P.v.d.A.)'][:6], len(genderlist), len(genderlist['CDA'])

My output results in: (['male', 'female', 'male', 'female', 'male', 'female'], 24, 50)

So, when I insert a partyname it results in the genders of all members in the party.

Now I need to make a dataframe like this: 在此处输入图片说明

So where it counts the genders seperatly and returns the femalepercentage in the dataframe.

I've now tried this:

pd.DataFrame(genderlist.items(),columns=['male', 'female'])

It results in: 在此处输入图片说明

How can I make a dataframe like expected, where the first 30 candidates of the party will be counted and result in a male and female separated dataframe with a percentage?

Can you please help me out, what can I do with my code from now on.

Thankyou in advance

You can use the list.count(element) function along with python dictionary comprehension to first create a dictionary of gender_counts which has the data you need and then use df.from_dict to convert that into a dataframe

#each list has gender of members of that party
party_A
['female', 'female', 'male', 'female', 'male', 'male', 'female', 'female',
 'female', 'female']

gender_dict = {'Party_A': party_A, 'Party_B': party_B, 
               'Party_C': party_C, 'Party_D': party_D}

gender_counts = {k: [v.count('male'), v.count('female')] for k, v in gender_dict.items()}

gender_counts
{'Party_A': [3, 7],
 'Party_B': [5, 9],
 'Party_C': [13, 7],
 'Party_D': [9, 6]}

df = pd.DataFrame.from_dict(gender_counts, orient='index', columns=['male', 'female'])

df
     male female 
Party_A 3   7   
Party_B 5   9   
Party_C 13  7   
Party_D 9   6   


df['Women_pecentage'] = df.female/(df.male+df.female)

df.round(2)

     male female Women_Percentage
Party_A 3   7   0.70
Party_B 5   9   0.64
Party_C 13  7   0.35
Party_D 9   6   0.40

Let df be your current output (I changed the column names):

df = pd.DataFrame(genderlist.items(), columns=['party_name', 'gender_list'])

gender_list is now a column of lists in this format:

['male', 'female', 'male', 'female', 'male', 'female']

Now you can just apply unique counts of elements using Counter , which returns a dictionary and then use apply(pd.Series) to split the column of dictionaries into separate columns.

from collections import Counter
df['gender_list'].apply(Counter).apply(pd.Series)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM