简体   繁体   English

根据时间标准将值从一个数据框中的多个列传输到另一个数据框中的新列

[英]Transferring values from multiple columns in a dataframe to a new column in another dataframe, based on time-criterion

I'm pretty new to Python, and start to realize its potential for serious number-crunching. 我对Python还是很陌生,并开始意识到它有可能进行严重的数字运算。

Presently, I wish to create a new column in a pandas dataframe (df1) and populate it with the numeric value from one of 24 columns (named 0.0 - 23.0 ) in another dataframe (df2). 目前,我希望创建一个数据框大熊猫(DF1)新的一列,并与来自24列(命名的一个数值来填充它0.0 - 23.0 )在另一个数据帧(DF2)。 Each of columns 0.0 - 23.0 represents hours ( 00.00-00.59 , 01.00-01.59 and so on). 各列的0.0 - 23.0表示小时( 00.00-00.5901.00-01.59等等)。 I want to perform my operation based on a time criterion. 我想根据时间标准执行操作。

There is a column time in df1, with a datetime value in the format YYYY-mm-dd HH:MM:SS . df1中有一个列time ,其日期时间值格式YYYY-mm-dd HH:MM:SS This column is not the index column of df1, so several rows may have the same value of 'time'. 该列不是df1的索引列,因此几行可能具有相同的“时间”值。 df1 contains a total of 300,000 rows. df1总共包含300,000行。

The index of df2 is a column date which contains values in the form YYYY-mm-dd . df2的索引是一列date ,其中包含格式YYYY-mm-dd df2 covers 3 years and hence contains a total of about 1,200 rows. df2涵盖3年,因此总共包含约1,200行。

For example, if the value of time in df1 is 2011-01-01 12:01:20 , I want to populate the new column in df1 with the numeric value from column 12.0 in df2, corresponding to the row with index 2011-01-01 . 例如,如果df1中的time值为2011-01-01 12:01:20 ,我想用df2中12.0列中的数值填充df1中的新列,该数值对应于具有索引2011-01-01的行2011-01-01

I have tried to merge the two dataframes and obtained a new dataframe containing df1 and the columns 0.0 - 23.0 matched to the correct date. 我试图在两个dataframes合并,并获得新的数据框包含DF1和列0.0 - 23.0匹配正确的日期。 I did this by converting 'time' to the YYYY-mm-dd format and applying .merge. 我是通过将“时间”转换为YYYY-mm-dd格式并应用.merge来实现的。 However, this dataframe is a bit too messy. 但是,此数据帧太混乱了。

Furthermore, I would like to write a function evaluating the new column in df1, to allow for a backward control that the imported values from df2 are correct. 此外,我想编写一个函数来评估df1中的新列,以允许向后控制从df2导入的值是正确的。

df1 DF1

KEY    time
252752 2011-01-01 04:20:00   
281789 2011-01-02 01:18:00   
242674 2011-01-03 03:08:00   
189497 2011-01-04 00:17:00   
189498 2011-01-05 05:31:00   
...    ...

df2 DF2

date         0.0         1.0         2.0         3.0         4.0         5.0        ...   23.0
2011-01-01   0.919355    0.925806    0.929032    0.932258    0.938710    0.953947   ...   1.037975
2011-01-02   1.026144    1.019608    1.022876    1.032680    1.035948    1.035948   ...   0.919355
2011-01-03   1.025316    1.034810    1.037975    1.034810    1.044304    1.044304   ...   1.018987
2011-01-04   1.018987    1.025316    1.031646    1.044304    1.047468    1.050633   ...   0.932258
2011-01-05   1.018987    1.018987    1.018987    1.022152    1.031646    1.037975   ...   0.953947
...          ...         ...         ...         ...         ...         ...        ...   ...

desired result 理想的结果

KEY    time                  value
252752 2011-01-01 04:20:00   0.938710
281789 2011-01-02 01:18:00   1.019608
242674 2011-01-03 03:08:00   1.034810
189497 2011-01-04 00:17:00   1.018987
189498 2011-01-05 05:31:00   1.037975
...    ...                   ...

im not sure if this helps... but thats the way i would write it: 我不确定这是否有帮助...但这就是我写的方式:

 ### just to have your test data
    df1_val =     ("252752 2011-01-01 04:20:00",   
                    "281789 2011-01-02 01:18:00",   
                    "242674 2011-01-03 03:08:00",   
                    "189497 2011-01-04 00:17:00",   
                    "189498 2011-01-05 05:31:00") 
    df1 = {}
    for row in df1_val:
        df1[row[0:5]]= (row[7:17], row[18:])

    df2_val = ( "2011-01-01   0.919355    0.925806    0.929032    0.932258    0.938710    0.953947",
                "2011-01-02   1.026144    1.019608    1.022876    1.032680    1.035948    1.035948",
                "2011-01-03   1.025316    1.034810    1.037975    1.034810    1.044304    1.044304",
                "2011-01-04   1.018987    1.025316    1.031646    1.044304    1.047468    1.050633",
                "2011-01-05   1.018987    1.018987    1.018987    1.022152    1.031646    1.037975")

    df2 = {}
    for row in df2_val:
        date, zero, one, two, three, four, five = row.split("   ")
        df2[date] = (zero, one, two, three, four, five)

    #### build the result dict
    result = {}    

    for key in df1:
        hour =  int(df1[key][1][:2])
        date = df1[key][0]
        result[key] = (df1[key][0] + "   " + df1[key][1], df2[date][hour], )
        print key
        print result[key] 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于另一个多个数据框列的新列 - New column based on another multiple dataframe columns 根据多列中的行值创建新的数据框列 - Creating a new dataframe column based on row values from multiple columns DataFrame 中的新列基于来自另一个 DataFrame 的行和列 - New column in DataFrame based on rows and columns from another DataFrame 根据条件从另一个数据帧的值向数据帧添加新列 - Adding a new column to a dataframe from the values of another dataframe based on a condition 子集根据另一个数据帧的值在多个列上进行pandas数据帧 - Subset pandas dataframe on multiple columns based on values from another dataframe Pandas:根据另一个数据框中的值更新数据框中的多列 - Pandas : Updating multiple column in a dataframe based on values from another dataframe 插入几个新列,其值基于 pandas 中 Dataframe 中的另一列 - Insert several new column with the values based on another columns in a Dataframe in pandas 根据多个列中的值创建新的数据框列 - Create new dataframe column based on values in multiple columns 根据来自其他列的值使用将 function 应用于多个列,在 dataframe 中创建新列 - Create new column into dataframe based on values from other columns using apply function onto multiple columns 基于过滤器添加新列并添加来自另一个 DataFrame 的值 - Add new columns and add values from another DataFrame based on a filter
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM