Skip to main content

Overview

In this guide, you’ll learn how to use the Oxla database with Urban Mobility datasets through various sample queries. These datasets include time-series data from New York City Taxi and For-Hire Vehicle Data. It contains several tables, such as trips, taxi zones, and weather observations. Data Source: https://github.com/toddwschneider/nyc-taxi-data .
Prerequisite
Ensure you’ve created the Data Storage with the appropriate demo dataset, the Cluster is running, and you’re connected to it using the PostgreSQL client. For more details, refer to the Quickstart - Oxla SaaS page.

Datasets Structure

Let’s explore the tables and their structures to better understand and fully utilize the Urban Mobility dataset.
Attribute NameType
idBigint
typeText
Attribute NameType
station_idText
station_nameText
dateDate
precipitationDouble precision
snow_depthDouble precision
snowfallDouble precision
max_temperatureDouble precision
min_temperatureDouble precision
average_wind_speedDouble precision
Attribute NameType
base_numberText
base_nameText
dbaText
dba_categoryText
Attribute NameType
idBigint
hvfhs_license_numText
dispatching_base_numText
originating_base_numText
request_datetimeTimestamp without time zone
on_scene_datetimeTimestamp without time zone
pickup_datetimeTimestamp without time zone
dropoff_datetimeTimestamp without time zone
pickup_location_idBigint
dropoff_location_idBigint
trip_milesDouble precision
trip_timeDouble precision
base_passenger_fareDouble precision
tollsDouble precision
black_car_fundDouble precision
sales_taxDouble precision
congestion_surchargeDouble precision
airport_feeDouble precision
tipsDouble precision
driver_payDouble precision
shared_requestBoolean
shared_matchBoolean
access_a_rideBoolean
wav_requestBoolean
wav_matchBoolean
legacy_shared_rideBigint
affiliated_base_numText
Attribute NameType
hvfhs_license_numText
dispatching_base_numText
originating_base_numText
request_datetimeTimestamp without time zone
on_scene_datetimeTimestamp without time zone
pickup_datetimeTimestamp without time zone
dropoff_datetimeTimestamp without time zone
pickup_location_idBigint
dropoff_location_idBigint
trip_milesDouble precision
trip_timeDouble precision
base_passenger_fareDouble precision
tollsDouble precision
black_car_fundDouble precision
sales_taxDouble precision
congestion_surchargeDouble precision
airport_feeDouble precision
tipsDouble precision
driver_payDouble precision
shared_request_flagText
shared_match_flagText
access_a_ride_flagText
wav_request_flagText
wav_match_flagText
shared_ride_flagText
affiliated_base_numText
legacy_shared_ride_flagText
Attribute NameType
idBigint
vendor_idBigint
lpep_pickup_datetimeTimestamp without time zone
lpep_dropoff_datetimeTimestamp without time zone
store_and_fwd_flagText
rate_code_idBigint
dropoff_location_idBigint
congestion_surchargeDouble precision
passenger_countBigint
trip_distanceDouble precision
fare_amountDouble precision
extraDouble precision
mta_taxDouble precision
tip_amountDouble precision
tolls_amountDouble precision
ehail_feeDouble precision
improvement_surchargeDouble precision
total_amountDouble precision
payment_typeBigint
trip_typeBigint
pickup_location_idBigint
Attribute NameType
license_numberText
company_nameText
Attribute NameType
gidBigint
ctlabelText
borocodeText
boronameText
ct2010Text
boroct2010Text
cdeligibilText
ntacodeText
ntanameText
pumaText
shape_lengDouble precision
shape_areaDouble precision
geomText
Attribute NameType
nyct2010_gidBigint
taxi_zone_location_idInteger
overlapDouble precision
Attribute NameType
sridBigint
auth_nameText
auth_sridBigint
srtextText
proj4textText
Attribute NameType
idBigint
cab_type_idBigint
vendor_idBigint
pickup_datetimeTimestamp without time zone
dropoff_datetimeTimestamp without time zone
store_and_fwd_flagBoolean
rate_code_idBigint
pickup_longitudeDouble precision
pickup_latitudeDouble precision
dropoff_longitudeDouble precision
dropoff_latitudeDouble precision
passenger_countBigint
trip_distanceDouble precision
fare_amountDouble precision
extraDouble precision
mta_taxDouble precision
tip_amountDouble precision
tolls_amountDouble precision
ehail_feeDouble precision
improvement_surchargeDouble precision
congestion_surchargeDouble precision
airport_feeDouble precision
total_amountDouble precision
payment_typeBigint
trip_typeBigint
pickup_nyct2010_gidBigint
dropoff_nyct2010_gidBigint
pickup_location_idBigint
dropoff_location_idBigint
Attribute NameType
gidBigint
objectidBigint
shape_lengDouble precision
shape_areaDouble precision
zoneText
locationidInteger
boroughText
geomText
Attribute NameType
idBigint
pickup_datetimeTimestamp without time zone
pickup_latitudeDouble precision
pickup_longitudeDouble precision
base_codeText
Attribute NameType
idBigint
vendor_idText
tpep_pickup_datetimeTimestamp without time zone
tpep_dropoff_datetimeTimestamp without time zone
passenger_countBigint
trip_distanceDouble precision
pickup_longitudeDouble precision
pickup_latitudeDouble precision
rate_code_idText
store_and_fwd_flagText
dropoff_longitudeDouble precision
dropoff_latitudeDouble precision
pickup_location_idBigint
dropoff_location_idBigint
payment_typeText
fare_amountDouble precision
extraDouble precision
mta_taxDouble precision
tip_amountDouble precision
tolls_amountDouble precision
improvement_surchargeDouble precision
total_amountDouble precision
congestion_surchargeDouble precision
airport_feeDouble precision

Sample Queries

Sample 1: Number of Trips by Payment Type

This example shows the number of trips for each payment type used in the dataset.
SELECT payment_type, count(*) AS count
FROM trips
GROUP BY payment_type
ORDER BY count DESC;
The query displays number of trips grouped by payment type.
 payment_type |   count   
--------------+-----------
            1 | 147533966
            2 | 147159834
            3 |    752296
            4 |    242825
            5 |    150926
              |     14580
(6 rows)

Sample 2: Calculate Revenue per Pickup Point

This example calculates the top 10 pickup locations with the highest revenue.
SELECT t.pickup_location_id AS pickup_location_id, 
       tz."zone" AS location_name,
       COUNT(*) AS trip_count,
       SUM(t.total_amount) AS total_revenue 
FROM trips t
JOIN taxi_zones tz ON t.pickup_location_id = tz.locationid 
GROUP BY t.pickup_location_id, tz."zone" 
ORDER BY total_revenue DESC 
LIMIT 10;
The result displays the top 10 pickup locations, with JFK Airport having the highest revenue.
 pickup_location_id |       location_name       | trip_count |   total_revenue    
--------------------+---------------------------+------------+--------------------
                132 | JFK Airport               |    4781775 | 240929978.91028678
                138 | LaGuardia Airport         |    5935988 | 219636305.66008496
                161 | Midtown Center            |    9978810 | 125652682.01979701
                 79 | East Village              |    9407751 | 119170547.52983798
                162 | Midtown East              |    9297167 | 118483924.92982587
                 48 | Clinton East              |    8872713 | 112228413.58982424
                170 | Murray Hill               |    9161159 | 111683576.97983585
                237 | Upper East Side South     |   10378670 | 111360934.71979024
                234 | Union Sq                  |    9156879 | 109465941.75984423
                230 | Times Sq/Theatre District |    9522197 | 109449951.09985997
(10 rows)
This example provides insights into Manhattan’s most popular cabs by showing key statistics for each cab type.
SELECT c.type AS cab_type,
       MIN(t.trip_distance) AS shortest_distance,
       MAX(t.trip_distance) AS farthest_distance,
       AVG(t.trip_distance) AS average_distance,
       AVG(t.fare_amount) AS average_fare,
       SUM(t.fare_amount) AS total_fare
FROM trips t
JOIN cab_types c ON t.cab_type_id = c.id
GROUP BY c.type
ORDER BY c.type;
The result shows the shortest and farthest trip distances, average trip distance, and average fare for each cab type.
 cab_type | shortest_distance | farthest_distance | average_distance  |    average_fare    |    total_fare     
----------+-------------------+-------------------+-------------------+--------------------+-------------------
 green    |                 0 |               640 | 2.846760430722446 | 12.214111199481835 | 154370169.6499999
 yellow   |         -21474834 |          16200370 | 5.955657407792438 | 11.567361225195867 |  3276058919.61685
(2 rows)

Sample 4: How Weather Impacts the Transportation Demand Analysis

Here we analyze the average temperature and wind speed from January to April on the 1st and 2nd of each month.
SELECT
    w."date" AS pickup_date,
    (w.max_temperature + w.min_temperature) / 2 AS average_temperature,
    w.average_wind_speed,
    COUNT(t.id) AS number_of_trips
FROM
    central_park_weather_observations w
LEFT JOIN
    trips t ON w."date" = t.pickup_datetime::date
WHERE
    w."date" IN (
        '2009-01-01', '2009-01-02',
        '2009-02-01', '2009-02-02',
        '2009-03-01', '2009-03-02',
        '2009-04-01', '2009-04-02'
    )
GROUP BY
    w."date", w.max_temperature, w.min_temperature, w.average_wind_speed
ORDER BY
    w."date";
The output helps us understand how weather conditions from January to April impact the number of trips taken on specific dates. It shows that days with higher temperatures and stronger winds see more trips happening.
 pickup_date | average_temperature | average_wind_speed | number_of_trips 
-------------+---------------------+--------------------+-----------------
 2009-01-01  |                20.5 |              11.18 |           65888
 2009-01-02  |                28.5 |               6.26 |           75481
 2009-02-01  |                  39 |               7.61 |           86019
 2009-02-02  |                43.5 |               5.14 |           80808
 2009-03-01  |                  32 |               9.84 |           86320
 2009-03-02  |                  22 |              10.96 |           65607
 2009-04-01  |                  46 |               6.71 |           77604
 2009-04-02  |                  54 |               5.59 |           99310
(8 rows)