简体   繁体   中英

Creating tuples from columns of Dataframe

在此处输入图像描述

I have a data set as such - and I want to create a List of tuples as

(Name_of_State , Literacy_rate)
(JAMMU&KASHMIR, 89.78) #example

I had to do a bit of cleaning up,removing districts and just keeping states.

data=data[data['Name']!='India']    #removing the India's row 
data=data[data['TRU']=='Total']    
 #Only keeping total and excluding the rural and urban rows
states_group=data[data['Level']=='State']
states_group

After that,here is my main code to focus -

literacy_rate=[]
total_state_pop=0
total_literate_pop=0
for key,group in states_group.iterrows():
    total_state_pop+=states_group['TOT_P']
    
    total_literate_pop+=states_group['P_LIT']
    total_literate_pop+=states_group['F_LIT']
    rate=(total_literate_pop/total_state_pop)*100
    literacy_rate.append((states_group['Name'],rate))
    
print(literacy_rate) 

But the output I get is as-

(3            JAMMU & KASHMIR
72          HIMACHAL PRADESH
111                   PUNJAB
174               CHANDIGARH
180              UTTARAKHAND
222                  HARYANA
288             NCT OF DELHI
318                RAJASTHAN
420            UTTAR PRADESH
636                    BIHAR
753                   SIKKIM
768                  MANIPUR
798                  MIZORAM
825                  TRIPURA
840                MEGHALAYA
864                    ASSAM
948              WEST BENGAL
1008               JHARKHAND
1083                  ODISHA
1176            CHHATTISGARH
1233          MADHYA PRADESH
1386                 GUJARAT
1467             DAMAN & DIU
1476    DADRA & NAGAR HAVELI
1482             MAHARASHTRA
1590          ANDHRA PRADESH
1662               KARNATAKA
1755                     GOA
1764                  KERALA
1809              TAMIL NADU
1908              PUDUCHERRY
Name: Name, dtype: object, 3        85.484832
72       99.946393
111      80.810862
174      93.793637
180      89.689123
222      79.608418
288      97.531743
318      67.745833
420      69.971651
636      52.937273
753      98.691424
768      96.236438
798     109.113300
825     116.065370
840      84.108326
864      96.451609
948      87.437511
1008     63.211190
1083     85.260257
1176     85.104889
1233     78.055310
1386     99.236215
1467    121.848465
1476    112.301972
1482    100.968386
1590     79.671587
1662     81.400129
1755    110.110417
1764    120.140132
1809     94.529868
1908    101.165414
dtype: float64), (3            JAMMU & KASHMIR
72          HIMACHAL PRADESH
111                   PUNJAB
174               CHANDIGARH
180              UTTARAKHAND
222                  HARYANA
288             NCT OF DELHI
318                RAJASTHAN
420            UTTAR PRADESH
636                    BIHAR
753                   SIKKIM
768                  MANIPUR
798                  MIZORAM
825                  TRIPURA
840                MEGHALAYA
864                    ASSAM
948              WEST BENGAL
1008               JHARKHAND
1083                  ODISHA
1176            CHHATTISGARH
1233          MADHYA PRADESH
1386                 GUJARAT
1467             DAMAN & DIU
1476    DADRA & NAGAR HAVELI
1482             MAHARASHTRA
1590          ANDHRA PRADESH
1662               KARNATAKA
1755                     GOA
1764                  KERALA
1809              TAMIL NADU
1908              PUDUCHERRY
Name: Name, dtype: object, 3        85.484832
72       99.946393
111      80.810862
174      93.793637
180      89.689123
222      79.608418
288      97.531743
318      67.745833
420      69.971651
636      52.937273
753      98.691424
768      96.236438
798     109.113300
825     116.065370
840      84.108326
864      96.451609
948      87.437511
1008     63.211190
1083     85.260257
1176     85.104889
1233     78.055310
1386     99.236215
1467    121.848465
1476    112.301972
1482    100.968386
1590     79.671587
1662     81.400129
1755    110.110417
1764    120.140132
1809     94.529868
1908    101.165414
dtype: float64), (3            JAMMU & KASHMIR
72          HIMACHAL PRADESH
111                   PUNJAB
174               CHANDIGARH
180              UTTARAKHAND
222                  HARYANA
288             NCT OF DELHI
318                RAJASTHAN
420            UTTAR PRADESH
636                    BIHAR
753                   SIKKIM
768                  MANIPUR
798                  MIZORAM
825                  TRIPURA
840                MEGHALAYA
864                    ASSAM
948              WEST BENGAL
1008               JHARKHAND
1083                  ODISHA
1176            CHHATTISGARH
1233          MADHYA PRADESH
1386                 GUJARAT
1467             DAMAN & DIU
1476    DADRA & NAGAR HAVELI
1482             MAHARASHTRA
1590          ANDHRA PRADESH
1662               KARNATAKA
1755                     GOA
1764                  KERALA
1809              TAMIL NADU
1908              PUDUCHERRY
Name: Name, dtype: object, 3        85.484832
72       99.946393
111      80.810862
174      93.793637
180      89.689123
222      79.608418
288      97.531743
318      67.745833
420      69.971651
636      52.937273
753      98.691424
768      96.236438
798     109.113300
825     116.065370
840      84.108326
864      96.451609
948      87.437511
1008     63.211190
1083     85.260257
1176     85.104889
1233     78.055310
1386     99.236215
1467    121.848465
1476    112.301972

And its more long ahead Here is the link whole data set Where am I getting wrong? Thanks in advance.

Avoid iteration if possible as it's an anti pattern for pandas. good read

import pandas as pd
data = pd.read_excel('state_dist_sc.xls')
data=data[data['Name']!='India']
data=data[data['TRU']=='Total']
states_group=data[data['Level']=='State']

#create a copy of data on which we will be calculating literacy rate.
states_group = states_group.copy()

#Calculate litracy rate using vector formula which is faster and more.
states_group['literacy_rate'] = 100*(states_group['P_LIT'] + states_group['F_LIT'])/states_group['TOT_P']

# use to_records to get list of tuples
ans = states_group[['Name','literacy_rate']].to_records(index=False)
ans

Output:

rec.array([('JAMMU & KASHMIR',  85.48483174),
           ('HIMACHAL PRADESH',  99.94639301), ('PUNJAB',  80.81086172),
           ('CHANDIGARH',  93.79363692), ('UTTARAKHAND',  89.68912284),
           ('HARYANA',  79.60841792), ('NCT OF DELHI',  97.53174349),
           ('RAJASTHAN',  67.74583313), ('UTTAR PRADESH',  69.97165068),
           ('BIHAR',  52.93727261), ('SIKKIM',  98.69142352),
           ('MANIPUR',  96.23643761), ('MIZORAM', 109.11330049),
           ('TRIPURA', 116.06537002), ('MEGHALAYA',  84.10832613),
           ('ASSAM',  96.45160871), ('WEST BENGAL',  87.43751069),
           ('JHARKHAND',  63.21118996), ('ODISHA',  85.26025661),
           ('CHHATTISGARH',  85.10488906),
           ('MADHYA PRADESH',  78.05530967), ('GUJARAT',  99.23621537),
           ('DAMAN & DIU', 121.84846506),
           ('DADRA & NAGAR HAVELI', 112.3019722 ),
           ('MAHARASHTRA', 100.96838647),
           ('ANDHRA PRADESH',  79.67158709), ('KARNATAKA',  81.40012899),
           ('GOA', 110.11041691), ('KERALA', 120.14013153),
           ('TAMIL NADU',  94.529868  ), ('PUDUCHERRY', 101.16541449)],
          dtype=[('Name', 'O'), ('literacy_rate', '<f8')])

in the for loop, how about chaging every states_group to group or there will be no meaning of doing for loop with .iterrows()

literacy_rate=[]
total_state_pop=0
total_literate_pop=0
for key,group in states_group.iterrows():
    total_state_pop+=group['TOT_P']
    
    total_literate_pop+=group['P_LIT']
    total_literate_pop+=group['F_LIT']
    rate=(total_literate_pop/total_state_pop)*100
    literacy_rate.append((group['Name'],rate))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM