简体   繁体   中英

Working with columns and rows python pandas

i am trying to collect data from pandas dataframes. In the Screenshot you will see a part of how the database is built.

数据库

So now I want to analyze for the same hhid other columns. For the same hhid I want to compute the away time. I want to select the first "from home" row and read the start value. Then this should not be overwritten again for the same hhids. After that I want the end value of the last "to home" entry and then compute the difference between them. I tried to implement that, but the most time the read start value of from home gets overwritten and the differences are not the same.

Here is my routine:

wid=1
for i in range(0,len(dataframe)):

    if (i+1 >= len(dataframe)):
              break  
    if (
               dataframe['hhid'].values[i] == dataframe['hhid'].values[i+1] or                                   
               dataframe['hhid'].values[i] == dataframe['hhid'].values[i-1]
      ):


       if (

               dataframe['w01'].values[i] == 'from Hause' and
               wid >= dataframe['wid'].values[i]
         ):

               bla = dataframe['wid'].values[i]

               start =  dataframe['st_std'].values[i]
               print('start',start)
               wid = dataframe['wid'].values[i]


       if (
               dataframe['w04'].values[i] == 'to Hause' 

          ):

           end =  dataframe['en_std'].values[i]
           print('end',end)
           dataframe['awaytime'].values[i]= (end-start)

           if end-start < 0:
               dataframe['awaytime'].values[i]= (start-end)+1

       else:
           continue

    if(dataframe['hhid'].values[i] != dataframe['hhid'].values[i+1]):
            if (i+1 >= len(dataframe)):
              break 
            wid=dataframe['wid'].values[i+1]

return dataframe

Any ideas how to do it correctly?

EDIT

sample of data in excel format. Unfortunately I am not allowed to upload the full dataset: https://www.dropbox.com/s/af3wb7fcsqhukvz/Export_db_awaytime.xlsx?dl=0

I think I solved the problem. I added an counter to hold the first value of from home. The values I get are good.

FYI the code:

counter=0
test_counter=0
from_home=0
for i in range(0,len(dataframe)):

    if (i+1 >= len(dataframe)):
              break  
    """Check for same hhid"""
    if (
               dataframe['hhid'].values[i] == dataframe['hhid'].values[i+1] or                                   
               dataframe['hhid'].values[i] == dataframe['hhid'].values[i-1]
       ):

       """Check for first departure"""
       if (

               dataframe['w01'].values[i] == 'from home' and
               counter<=test_counter
         ):

               start =  dataframe['st_std'].values[i]
               #print('start',start)
               from_home=1
               counter+=1
       """Check way home"""    
       if (
               dataframe['w04'].values[i] == 'to home' and
               from_home==1
          ):

           end =  dataframe['en_std'].values[i]

           dataframe['awaytime'].values[i]= (end-start)

           if end-start < 0:
               dataframe['awaytime].values[i]= (start-end)+1

    """Check when another hhid is next entry"""   

    if(dataframe['hhid'].values[i] != dataframe['hhid'].values[i+1]):
            if (i+1 >= len(dataframe)):
              break 
            counter=0
            from_home=0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM