简体   繁体   中英

Transferring values from multiple columns in a dataframe to a new column in another dataframe, based on time-criterion

I'm pretty new to Python, and start to realize its potential for serious number-crunching.

Presently, I wish to create a new column in a pandas dataframe (df1) and populate it with the numeric value from one of 24 columns (named 0.0 - 23.0 ) in another dataframe (df2). Each of columns 0.0 - 23.0 represents hours ( 00.00-00.59 , 01.00-01.59 and so on). I want to perform my operation based on a time criterion.

There is a column time in df1, with a datetime value in the format YYYY-mm-dd HH:MM:SS . This column is not the index column of df1, so several rows may have the same value of 'time'. df1 contains a total of 300,000 rows.

The index of df2 is a column date which contains values in the form YYYY-mm-dd . df2 covers 3 years and hence contains a total of about 1,200 rows.

For example, if the value of time in df1 is 2011-01-01 12:01:20 , I want to populate the new column in df1 with the numeric value from column 12.0 in df2, corresponding to the row with index 2011-01-01 .

I have tried to merge the two dataframes and obtained a new dataframe containing df1 and the columns 0.0 - 23.0 matched to the correct date. I did this by converting 'time' to the YYYY-mm-dd format and applying .merge. However, this dataframe is a bit too messy.

Furthermore, I would like to write a function evaluating the new column in df1, to allow for a backward control that the imported values from df2 are correct.

df1

KEY    time
252752 2011-01-01 04:20:00   
281789 2011-01-02 01:18:00   
242674 2011-01-03 03:08:00   
189497 2011-01-04 00:17:00   
189498 2011-01-05 05:31:00   
...    ...

df2

date         0.0         1.0         2.0         3.0         4.0         5.0        ...   23.0
2011-01-01   0.919355    0.925806    0.929032    0.932258    0.938710    0.953947   ...   1.037975
2011-01-02   1.026144    1.019608    1.022876    1.032680    1.035948    1.035948   ...   0.919355
2011-01-03   1.025316    1.034810    1.037975    1.034810    1.044304    1.044304   ...   1.018987
2011-01-04   1.018987    1.025316    1.031646    1.044304    1.047468    1.050633   ...   0.932258
2011-01-05   1.018987    1.018987    1.018987    1.022152    1.031646    1.037975   ...   0.953947
...          ...         ...         ...         ...         ...         ...        ...   ...

desired result

KEY    time                  value
252752 2011-01-01 04:20:00   0.938710
281789 2011-01-02 01:18:00   1.019608
242674 2011-01-03 03:08:00   1.034810
189497 2011-01-04 00:17:00   1.018987
189498 2011-01-05 05:31:00   1.037975
...    ...                   ...

im not sure if this helps... but thats the way i would write it:

 ### just to have your test data
    df1_val =     ("252752 2011-01-01 04:20:00",   
                    "281789 2011-01-02 01:18:00",   
                    "242674 2011-01-03 03:08:00",   
                    "189497 2011-01-04 00:17:00",   
                    "189498 2011-01-05 05:31:00") 
    df1 = {}
    for row in df1_val:
        df1[row[0:5]]= (row[7:17], row[18:])

    df2_val = ( "2011-01-01   0.919355    0.925806    0.929032    0.932258    0.938710    0.953947",
                "2011-01-02   1.026144    1.019608    1.022876    1.032680    1.035948    1.035948",
                "2011-01-03   1.025316    1.034810    1.037975    1.034810    1.044304    1.044304",
                "2011-01-04   1.018987    1.025316    1.031646    1.044304    1.047468    1.050633",
                "2011-01-05   1.018987    1.018987    1.018987    1.022152    1.031646    1.037975")

    df2 = {}
    for row in df2_val:
        date, zero, one, two, three, four, five = row.split("   ")
        df2[date] = (zero, one, two, three, four, five)

    #### build the result dict
    result = {}    

    for key in df1:
        hour =  int(df1[key][1][:2])
        date = df1[key][0]
        result[key] = (df1[key][0] + "   " + df1[key][1], df2[date][hour], )
        print key
        print result[key] 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM