Social Media Dataset
Overview
In this guide, you’ll learn how to use the Oxla database with Social Media datasets through various sample queries. In this case, we’ll utilize the GitHub datasets, from which you can retrieve events in all GitHub repositories since 2011 in a structured format.
Data Source: https://github.com/igrigorik/gharchive.org .
Ensure you’ve created the Data Storage with the appropriate demo dataset, the Cluster is running, and you’re connected to it using the PostgreSQL client. For more details, refer to the Quickstart - Oxla SaaS page.
Datasets Structure
Let’s explore the tables and their structures to better understand and fully utilize the Web Traffic dataset.
Sample Queries
Sample 1: Identify the Most Active Users
This example identifies the 10 most active users on GitHub by counting their triggered events.
It joins the github_users
table with the github_events
table to show ten active users.
Sample 2: Find the Most and Least Used Events
We want to identify the most used event and the least used event in the github_events
table.
This output indicates that the PushEvent
is the most used event and PublicEvent
is the least used event.
Sample 3: Analyzing Event Types Distribution per User
In this example, we calculates the distribution of different event types each user has participated in. It’s useful for understanding what types of activities users are most involved in, which can inform decision-making.
The query returns a list of users along with the types of events they used and the count of each event type.