根据时间标准将值从一个数据框中的多个列传输到另一个数据框中的新列

Question

I'm pretty new to Python, and start to realize its potential for serious number-crunching. 我对Python还是很陌生，并开始意识到它有可能进行严重的数字运算。

Presently, I wish to create a new column in a pandas dataframe (df1) and populate it with the numeric value from one of 24 columns (named 0.0 - 23.0 ) in another dataframe (df2). 目前，我希望创建一个数据框大熊猫（DF1）新的一列，并与来自24列（命名的一个数值来填充它0.0 - 23.0 ）在另一个数据帧（DF2）。 Each of columns 0.0 - 23.0 represents hours ( 00.00-00.59 , 01.00-01.59 and so on). 各列的0.0 - 23.0表示小时（ 00.00-00.59 ， 01.00-01.59等等）。 I want to perform my operation based on a time criterion. 我想根据时间标准执行操作。

There is a column time in df1, with a datetime value in the format YYYY-mm-dd HH:MM:SS . df1中有一个列time ，其日期时间值格式YYYY-mm-dd HH:MM:SS 。 This column is not the index column of df1, so several rows may have the same value of 'time'. 该列不是df1的索引列，因此几行可能具有相同的“时间”值。 df1 contains a total of 300,000 rows. df1总共包含300,000行。

The index of df2 is a column date which contains values in the form YYYY-mm-dd . df2的索引是一列date ，其中包含格式YYYY-mm-dd 。 df2 covers 3 years and hence contains a total of about 1,200 rows. df2涵盖3年，因此总共包含约1,200行。

For example, if the value of time in df1 is 2011-01-01 12:01:20 , I want to populate the new column in df1 with the numeric value from column 12.0 in df2, corresponding to the row with index 2011-01-01 . 例如，如果df1中的time值为2011-01-01 12:01:20 ，我想用df2中12.0列中的数值填充df1中的新列，该数值对应于具有索引2011-01-01的行2011-01-01 。

I have tried to merge the two dataframes and obtained a new dataframe containing df1 and the columns 0.0 - 23.0 matched to the correct date. 我试图在两个dataframes合并，并获得新的数据框包含DF1和列0.0 - 23.0匹配正确的日期。 I did this by converting 'time' to the YYYY-mm-dd format and applying .merge. 我是通过将“时间”转换为YYYY-mm-dd格式并应用.merge来实现的。 However, this dataframe is a bit too messy. 但是，此数据帧太混乱了。

Furthermore, I would like to write a function evaluating the new column in df1, to allow for a backward control that the imported values from df2 are correct. 此外，我想编写一个函数来评估df1中的新列，以允许向后控制从df2导入的值是正确的。

df1 DF1

KEY    time
252752 2011-01-01 04:20:00   
281789 2011-01-02 01:18:00   
242674 2011-01-03 03:08:00   
189497 2011-01-04 00:17:00   
189498 2011-01-05 05:31:00   
...    ...

df2 DF2

date         0.0         1.0         2.0         3.0         4.0         5.0        ...   23.0
2011-01-01   0.919355    0.925806    0.929032    0.932258    0.938710    0.953947   ...   1.037975
2011-01-02   1.026144    1.019608    1.022876    1.032680    1.035948    1.035948   ...   0.919355
2011-01-03   1.025316    1.034810    1.037975    1.034810    1.044304    1.044304   ...   1.018987
2011-01-04   1.018987    1.025316    1.031646    1.044304    1.047468    1.050633   ...   0.932258
2011-01-05   1.018987    1.018987    1.018987    1.022152    1.031646    1.037975   ...   0.953947
...          ...         ...         ...         ...         ...         ...        ...   ...

desired result 理想的结果

KEY    time                  value
252752 2011-01-01 04:20:00   0.938710
281789 2011-01-02 01:18:00   1.019608
242674 2011-01-03 03:08:00   1.034810
189497 2011-01-04 00:17:00   1.018987
189498 2011-01-05 05:31:00   1.037975
...    ...                   ...

Answer 1

im not sure if this helps... but thats the way i would write it: 我不确定这是否有帮助...但这就是我写的方式：

 ### just to have your test data
    df1_val =     ("252752 2011-01-01 04:20:00",   
                    "281789 2011-01-02 01:18:00",   
                    "242674 2011-01-03 03:08:00",   
                    "189497 2011-01-04 00:17:00",   
                    "189498 2011-01-05 05:31:00") 
    df1 = {}
    for row in df1_val:
        df1[row[0:5]]= (row[7:17], row[18:])

    df2_val = ( "2011-01-01   0.919355    0.925806    0.929032    0.932258    0.938710    0.953947",
                "2011-01-02   1.026144    1.019608    1.022876    1.032680    1.035948    1.035948",
                "2011-01-03   1.025316    1.034810    1.037975    1.034810    1.044304    1.044304",
                "2011-01-04   1.018987    1.025316    1.031646    1.044304    1.047468    1.050633",
                "2011-01-05   1.018987    1.018987    1.018987    1.022152    1.031646    1.037975")

    df2 = {}
    for row in df2_val:
        date, zero, one, two, three, four, five = row.split("   ")
        df2[date] = (zero, one, two, three, four, five)

    #### build the result dict
    result = {}    

    for key in df1:
        hour =  int(df1[key][1][:2])
        date = df1[key][0]
        result[key] = (df1[key][0] + "   " + df1[key][1], df2[date][hour], )
        print key
        print result[key]

根据时间标准将值从一个数据框中的多个列传输到另一个数据框中的新列

问题描述

1 个解决方案

解决方案1
0 2015-01-10 23:58:04

根据时间标准将值从一个数据框中的多个列传输到另一个数据框中的新列

问题描述

1 个解决方案

解决方案1 0 2015-01-10 23:58:04

解决方案1
0 2015-01-10 23:58:04