简体   繁体   中英

Pandas data frame and SQL query

I'm trying to translate the SQL query to pandas. However, after trying a lot I have now a knot in my head...

SELECT
 ID, Date1, Date2, Value
FROM
 data t1
WHERE
 t1.ID = 100 AND Date2 BETWEEN '2010-01-01 00:00:00.0' AND '2010-01-31 23:59:59.0' AND t1.Date1 =
 (
  SELECT
   max(t2.Date1)
  FROM
   data t2
  WHERE
   t2.Date1 <= '2010-02-01 00:00:00.0' AND t2.ID = t1.ID AND t2.Date2 = t1.Date2
 ) 
ORDER BY
 t1.Date2

Does anyone have a clever idea?

Many Thanks

You can load data by using the read_sql_query method.

import pandas as pd


df = pd.read_sql_query(your_sql_statement, your_db_connection)

Thanks. But that is not what I was looking for because to query in this way takes long. The way I'm looking for should be like

df[(df['Date2'] >= '2010-01-01 00:00:00.0') & (df['Date2'] <= '2010-01-31 23:59:59.0') & ??????????? & df['Date1'] <= '2010-02-01 00:00:00.0' ?????????????

You don't show any example data, so an answer cannot easily be checked.

The inner part of your query

 SELECT
   max(t2.Date1)
  FROM
   data t2
  WHERE
   t2.Date1 <= '2010-02-01 00:00:00.0' AND t2.ID = t1.ID AND t2.Date2 = t1.Date2

becomes

mask = df.Date1 <= '2010-02-01'
inner = df.loc[mask, :].groupby(['Date2', 'ID'], as_index=False)['Date1'].agg('max')

This DataFrame can now be joined with your initial df :

mask = (df.ID == 100) & (df.Date2 >= '2010-01-01 00:00:00.0') & (df.Date2 < '2010-01-31 23:59:59.0')
df.loc[mask, ['ID', 'Date2', 'Value']].merge(inner, on=['ID', 'Date2'])

I'm trying to use smart logics to filter data from a dataframe.

So, the dataframe looks like this:

Date1               Date2               Value
01.03.2019 01:00    02.03.2019 00:00    0,824778017
01.03.2019 01:00    03.03.2019 00:00    0,235332219
01.03.2019 01:00    04.03.2019 00:00    0,0545149
01.03.2019 01:00    05.03.2019 00:00    0,088324545
01.03.2019 01:00    06.03.2019 00:00    0,011294991
01.03.2019 19:00    02.03.2019 00:00    0,184424959
01.03.2019 19:00    03.03.2019 00:00    0,610644963
01.03.2019 19:00    04.03.2019 00:00    0,777668521
01.03.2019 19:00    05.03.2019 00:00    0,922268093
01.03.2019 19:00    06.03.2019 00:00    0,654392958
02.03.2019 01:00    03.03.2019 00:00    0,388756252
02.03.2019 01:00    04.03.2019 00:00    0,561393704
02.03.2019 01:00    05.03.2019 00:00    0,761488545
02.03.2019 01:00    06.03.2019 00:00    0,831463861
02.03.2019 01:00    07.03.2019 00:00    0,981502269
02.03.2019 19:00    03.03.2019 00:00    0,277360792
02.03.2019 19:00    04.03.2019 00:00    0,502428364
02.03.2019 19:00    05.03.2019 00:00    0,241836513
02.03.2019 19:00    06.03.2019 00:00    0,118992825
02.03.2019 19:00    07.03.2019 00:00    0,584641587
03.03.2019 01:00    04.03.2019 00:00    0,236813627
03.03.2019 01:00    05.03.2019 00:00    0,53616114
03.03.2019 01:00    06.03.2019 00:00    0,959270138
03.03.2019 01:00    07.03.2019 00:00    0,856270711
03.03.2019 01:00    08.03.2019 00:00    0,537138196
03.03.2019 19:00    04.03.2019 00:00    0,298802098
03.03.2019 19:00    05.03.2019 00:00    0,850840681
03.03.2019 19:00    06.03.2019 00:00    0,268404466
03.03.2019 19:00    07.03.2019 00:00    0,472132954
03.03.2019 19:00    08.03.2019 00:00    0,189761554

My objective is the following:

Date2 is given between 02.03.2019 00:00:00 and 07.03.2019 00:00:00 .

First: for given Date2 , retrun Value where Date1 is the latest date

Date1               Date2               Value
01.03.2019 19:00    02.03.2019 00:00    0,184424959
02.03.2019 19:00    03.03.2019 00:00    0,277360792
03.03.2019 19:00    04.03.2019 00:00    0,298802098
03.03.2019 19:00    05.03.2019 00:00    0,850840681
03.03.2019 19:00    06.03.2019 00:00    0,268404466
03.03.2019 19:00    07.03.2019 00:00    0,472132954

Second: for given Date2 , return Value where Date1 does not go beyond a specific date

Date1               Date2               Value
01.03.2019 19:00    02.03.2019 00:00    0,184424959
02.03.2019 01:00    03.03.2019 00:00    0,388756252
02.03.2019 01:00    04.03.2019 00:00    0,561393704
02.03.2019 01:00    05.03.2019 00:00    0,761488545
02.03.2019 01:00    06.03.2019 00:00    0,831463861
02.03.2019 01:00    07.03.2019 00:00    0,981502269

My approach is to filter first Date2 and then Date1

is_date2 = (df ['Date2']> = '02.03.2019 00:00:00 ') & (df [' Date2 '] <'07.03.2019 23:59:59')
df = df.loc [is_date2]

is_date1 = (df ['Date1'] <= '07.03.2019 19:00:00 ') & ... ???
df = df.loc [is_date1]

The answer from JoergVanAken is helpful, but has not yet reached my goal.

You can also interpert Date1 as a forecast date and Date2 as a value date

Thanks in advance

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM