Overview

In this guide, you’ll learn how to use the Oxla database with Social Media datasets through various sample queries. In this case, we’ll utilize the GitHub datasets, from which you can retrieve events in all GitHub repositories since 2011 in a structured format.

Data Source: https://github.com/igrigorik/gharchive.org .

Prerequisite
Ensure you’ve created the Data Storage with the appropriate demo dataset, the Cluster is running, and you’re connected to it using the PostgreSQL client. For more details, refer to the Quickstart - Oxla SaaS page.

Datasets Structure

Let’s explore the tables and their structures to better understand and fully utilize the Web Traffic dataset.

Sample Queries

Sample 1: Identify the Most Active Users

This example identifies the 10 most active users on GitHub by counting their triggered events.

SELECT u.user_id, u.login, COUNT(e.event_id) AS event_count
FROM github_users u
JOIN github_events e ON u.user_id = e.user_id
GROUP BY u.user_id, u.login
ORDER BY event_count DESC
LIMIT 10;

It joins the github_users table with the github_events table to show ten active users.

 user_id  |        login         | event_count 
----------+----------------------+-------------
  8517910 | LombiqBot            |        2232
 14018773 | yren-zenefits        |         815
 14175800 | OpenLocalizationTest |         698
 10810283 | direwolf-github      |         625
  3226435 | digimatic            |         288
 10192776 | wangliandong         |         221
  1118433 | TerraTech            |         218
 23172956 | CSCC01-Team05        |         208
 12490588 | suzutan0s2           |         191
   406518 | davecramer           |         187
(10 rows)

Sample 2: Find the Most and Least Used Events

We want to identify the most used event and the least used event in the github_events table.

  SELECT event_type, COUNT(event_id) AS event_count, 'most used' AS usage_type
  FROM github_events
  GROUP BY event_type
  ORDER BY event_count DESC
  LIMIT 1
)
UNION ALL
(
  SELECT event_type, COUNT(event_id) AS event_count, 'least used' AS usage_type
  FROM github_events
  GROUP BY event_type
  ORDER BY event_count ASC
  LIMIT 1
);

This output indicates that the PushEvent is the most used event and PublicEvent is the least used event.

 event_type  | event_count | usage_type 
-------------+-------------+------------
 PushEvent   |       62413 | most used
 PublicEvent |         130 | least used
(2 rows)

Sample 3: Find Users with No Events

In this example, we identify users who have not triggered any events.

SELECT u.user_id, u.login
FROM github_users u
LEFT JOIN github_events e ON u.user_id = e.user_id
WHERE e.user_id IS NULL LIMIT 10;

The query returns a list of 10 users who haven’t created any events.

 user_id  |       login        
----------+--------------------
 24285590 | Beitaier
 24285594 | priyasdevops
 24285595 | foojoshua
 24285599 | joan07290
 24285603 | roycat
 24285604 | Dragonhed
 24285607 | kabirs12
 24285608 | knockknockhusthere
 24285611 | wangbrian2736
 24285612 | utkML
(10 rows)