简体   繁体   中英

Self-Join with Subquery in Google BigQuery

I am a SQL noob in need of some help with a specific query using the NYC 2013 Taxi Trips Dataset located here .

I want to analyze dropoffs at JFK Airport, but want to build my query so that I can include the next subsequent pickup that a taxi does after dropping off someone at the airport.

This query gets me all the trips at the airport for a given day:

SELECT * FROM [833682135931:nyctaxi.trip_data] 
WHERE DATE(pickup_datetime) = '2013-05-01'
  AND FLOAT(pickup_latitude) < 40.651381
  AND FLOAT(pickup_latitude) > 40.640668
  AND FLOAT(pickup_longitude) < -73.776283
  AND FLOAT(pickup_longitude) > -73.794694

I want to join the dataset with itself to add next_pickup_time, next_pickup_lat, and next_pickup_lon values for each row.

To do this, I assume I need a correlated subquery, but don't know where to start building it out because the subquery is based on the outer query.

It needs to search for trips with the same medallion, on the same day, and with a pickup time later than the current airport dropoff, then limit 1... Any help is much appreciated!

考虑使用LAG窗口功能而不是自连接

I think that NN has the right idea, except that you want LEAD instead of LAG to get the next pickup. For example, this query will produce the next pickup time, lat and long after a pickup at JFK.

SELECT
    medallion,
    pickup_datetime,
    pickup_longitude,
    pickup_latitude,
    LEAD(pickup_datetime, 1, "") OVER (PARTITION BY medallion ORDER BY pickup_datetime) AS next_datetime,
    LEAD(pickup_longitude, 1, "0.0") OVER (PARTITION BY medallion ORDER BY pickup_datetime) AS next_longitude,
    LEAD(pickup_latitude, 1, "0.0") OVER (PARTITION BY medallion ORDER BY pickup_datetime) AS next_latitude
FROM [833682135931:nyctaxi.trip_data]
WHERE DATE(pickup_datetime) = '2013-05-01'
  AND FLOAT(pickup_latitude) < 40.651381
  AND FLOAT(pickup_latitude) > 40.640668
  AND FLOAT(pickup_longitude) < -73.776283
  AND FLOAT(pickup_longitude) > -73.794694;

Any time you can avoid a self-join, it's good to do so.

This should give you all the dropoffs with next pickups

SELECT *
FROM
  (SELECT medallion,
          dropoff_datetime,
          dropoff_longitude,
          dropoff_latitude,
          LEAD(pickup_datetime, 1, "") OVER (PARTITION BY medallion
                                             ORDER BY pickup_datetime) AS next_datetime,
          LEAD(pickup_longitude, 1, "0.0") OVER (PARTITION BY medallion
                                                 ORDER BY pickup_datetime) AS next_longitude,
          LEAD(pickup_latitude, 1, "0.0") OVER (PARTITION BY medallion
                                                ORDER BY pickup_datetime) AS next_latitude
   FROM [833682135931:nyctaxi.trip_data]) d
WHERE date(next_datetime)=date(dropoff_datetime)
  AND DATE(dropoff_datetime) = '2013-05-01'
  AND FLOAT(dropoff_latitude) < 40.651381
  AND FLOAT(dropoff_latitude) > 40.640668
  AND FLOAT(dropoff_longitude) < -73.776283
  AND FLOAT(dropoff_longitude) > -73.794694

This is what finally worked, modified from Pentium10's answer:

SELECT *
FROM
  (SELECT medallion,
          dropoff_datetime,
          dropoff_longitude,
          dropoff_latitude,
          LEAD(pickup_datetime, 1, "") OVER (PARTITION BY medallion
                                             ORDER BY pickup_datetime) AS next_datetime,
          LEAD(pickup_longitude, 1, "0.0") OVER (PARTITION BY medallion
                                                 ORDER BY pickup_datetime) AS next_longitude,
          LEAD(pickup_latitude, 1, "0.0") OVER (PARTITION BY medallion
                                                ORDER BY pickup_datetime) AS next_latitude
   FROM [833682135931:nyctaxi.trip_data]) d
WHERE date(next_datetime)=date(dropoff_datetime)
  AND DATE(dropoff_datetime) = '2013-05-01'
  AND FLOAT(dropoff_latitude) < 40.651381
  AND FLOAT(dropoff_latitude) > 40.640668
  AND FLOAT(dropoff_longitude) < -73.776283
  AND FLOAT(dropoff_longitude) > -73.794694

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM