Overview

In this guide, you’ll learn how to use the Oxla database with Social Media datasets through various sample queries. In this case, we’ll utilize the GitHub datasets, from which you can retrieve events in all GitHub repositories since 2011 in a structured format.

Data Source: https://github.com/igrigorik/gharchive.org .

Prerequisite
Ensure you’ve created the Data Storage with the appropriate demo dataset, the Cluster is running, and you’re connected to it using the PostgreSQL client. For more details, refer to the Quickstart - Oxla SaaS page.

Datasets Structure

Let’s explore the tables and their structures to better understand and fully utilize the Web Traffic dataset.

Sample Queries

Sample 1: Identify the Most Active Users

This example identifies the 10 most active users on GitHub by counting their triggered events.

SELECT u.user_id, u.login, COUNT(e.event_id) AS event_count
FROM github_users u
JOIN github_events e ON u.user_id = e.user_id
GROUP BY u.user_id, u.login
ORDER BY event_count DESC
LIMIT 10;

It joins the github_users table with the github_events table to show ten active users.

 user_id  |        login         | event_count 
----------+----------------------+-------------
  8517910 | LombiqBot            |       16605
 14175800 | OpenLocalizationTest |       11963
 10810283 | direwolf-github      |        4774
  3226435 | digimatic            |        2596
    57483 | tmtmtmtm             |        2338
    53164 | avsm                 |        1747
  2354108 | coveralls            |        1648
 13653959 | k8s-merge-robot      |        1458
 12490588 | suzutan0s2           |        1431
   406518 | davecramer           |        1417
(10 rows)

Sample 2: Find the Most and Least Used Events

We want to identify the most used event and the least used event in the github_events table.

(
  SELECT event_type, COUNT(event_id) AS event_count, 'most used' AS usage_type
  FROM github_events
  GROUP BY event_type
  LIMIT 1
)
UNION ALL
(
  SELECT event_type, COUNT(event_id) AS event_count, 'least used' AS usage_type
  FROM github_events
  GROUP BY event_type
  LIMIT 1
);

This output indicates that the PushEvent is the most used event and PublicEvent is the least used event.

 event_type  | event_count | usage_type 
-------------+-------------+------------
 PushEvent   |      587617 | most used
 PublicEvent |        1140 | least used
(2 rows)

Sample 3: Analyzing Event Types Distribution per User

In this example, we calculates the distribution of different event types each user has participated in. It’s useful for understanding what types of activities users are most involved in, which can inform decision-making.

SELECT 
    gu.user_id, 
    gu.login, 
    ge.event_type, 
    COUNT(ge.event_id) AS event_count
FROM 
    github_users gu
JOIN 
    github_events ge ON gu.user_id = ge.user_id
GROUP BY 
    gu.user_id, 
    gu.login, 
    ge.event_type
ORDER BY 
    gu.user_id, 
    event_count DESC LIMIT 20;

The query returns a list of users along with the types of events they used and the count of each event type.

 user_id |    login     |    event_type     | event_count 
---------+--------------+-------------------+-------------
      21 | technoweenie | IssueCommentEvent |           6
      21 | technoweenie | IssuesEvent       |           5
      21 | technoweenie | ReleaseEvent      |           1
      21 | technoweenie | CreateEvent       |           1
      22 | macournoyer  | IssueCommentEvent |           1
      22 | macournoyer  | PullRequestEvent  |           1
      22 | macournoyer  | PushEvent         |           1
      38 | atmos        | PushEvent         |           1
      45 | mojodna      | WatchEvent        |           2
      69 | rsanheim     | IssuesEvent       |           2
      78 | indirect     | IssueCommentEvent |           3
      78 | indirect     | PushEvent         |           1
      78 | indirect     | PullRequestEvent  |           1
      78 | indirect     | DeleteEvent       |           1
      81 | engineyard   | ForkEvent         |           1
      81 | engineyard   | MemberEvent       |           1
      82 | jsierles     | IssueCommentEvent |           3
      82 | jsierles     | PushEvent         |           3
      82 | jsierles     | PullRequestEvent  |           3
      85 | brixen       | IssueCommentEvent |           2
(20 rows)