I'm pretty new to Python, and start to realize its potential for serious number-crunching.
Presently, I wish to create a new column in a pandas dataframe (df1) and populate it with the numeric value from one of 24 columns (named 0.0
- 23.0
) in another dataframe (df2). Each of columns 0.0
- 23.0
represents hours ( 00.00-00.59
, 01.00-01.59
and so on). I want to perform my operation based on a time criterion.
There is a column time
in df1, with a datetime value in the format YYYY-mm-dd HH:MM:SS
. This column is not the index column of df1, so several rows may have the same value of 'time'. df1 contains a total of 300,000 rows.
The index of df2 is a column date
which contains values in the form YYYY-mm-dd
. df2 covers 3 years and hence contains a total of about 1,200 rows.
For example, if the value of time
in df1 is 2011-01-01 12:01:20
, I want to populate the new column in df1 with the numeric value from column 12.0
in df2, corresponding to the row with index 2011-01-01
.
I have tried to merge the two dataframes and obtained a new dataframe containing df1 and the columns 0.0
- 23.0
matched to the correct date. I did this by converting 'time' to the YYYY-mm-dd
format and applying .merge. However, this dataframe is a bit too messy.
Furthermore, I would like to write a function evaluating the new column in df1, to allow for a backward control that the imported values from df2 are correct.
df1
KEY time
252752 2011-01-01 04:20:00
281789 2011-01-02 01:18:00
242674 2011-01-03 03:08:00
189497 2011-01-04 00:17:00
189498 2011-01-05 05:31:00
... ...
df2
date 0.0 1.0 2.0 3.0 4.0 5.0 ... 23.0
2011-01-01 0.919355 0.925806 0.929032 0.932258 0.938710 0.953947 ... 1.037975
2011-01-02 1.026144 1.019608 1.022876 1.032680 1.035948 1.035948 ... 0.919355
2011-01-03 1.025316 1.034810 1.037975 1.034810 1.044304 1.044304 ... 1.018987
2011-01-04 1.018987 1.025316 1.031646 1.044304 1.047468 1.050633 ... 0.932258
2011-01-05 1.018987 1.018987 1.018987 1.022152 1.031646 1.037975 ... 0.953947
... ... ... ... ... ... ... ... ...
desired result
KEY time value
252752 2011-01-01 04:20:00 0.938710
281789 2011-01-02 01:18:00 1.019608
242674 2011-01-03 03:08:00 1.034810
189497 2011-01-04 00:17:00 1.018987
189498 2011-01-05 05:31:00 1.037975
... ... ...
im not sure if this helps... but thats the way i would write it:
### just to have your test data
df1_val = ("252752 2011-01-01 04:20:00",
"281789 2011-01-02 01:18:00",
"242674 2011-01-03 03:08:00",
"189497 2011-01-04 00:17:00",
"189498 2011-01-05 05:31:00")
df1 = {}
for row in df1_val:
df1[row[0:5]]= (row[7:17], row[18:])
df2_val = ( "2011-01-01 0.919355 0.925806 0.929032 0.932258 0.938710 0.953947",
"2011-01-02 1.026144 1.019608 1.022876 1.032680 1.035948 1.035948",
"2011-01-03 1.025316 1.034810 1.037975 1.034810 1.044304 1.044304",
"2011-01-04 1.018987 1.025316 1.031646 1.044304 1.047468 1.050633",
"2011-01-05 1.018987 1.018987 1.018987 1.022152 1.031646 1.037975")
df2 = {}
for row in df2_val:
date, zero, one, two, three, four, five = row.split(" ")
df2[date] = (zero, one, two, three, four, five)
#### build the result dict
result = {}
for key in df1:
hour = int(df1[key][1][:2])
date = df1[key][0]
result[key] = (df1[key][0] + " " + df1[key][1], df2[date][hour], )
print key
print result[key]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.