Overview

The REGR_COUNT() aggregate function calculates the number of non-null value pairs for a dependent variable (y) and an independent variable (x). This function is used in linear regression analysis to determine the number of valid data points available for computation.

Syntax

The syntax for this function is as follows:

REGR_COUNT(y, x)

Parameters

  • y: variable being predicted
  • x: variable used for prediction

Example

For the needs of this section, we’re going to use a simplified version of the film table from the Pagila database, containing only the title, length and rating columns. The complete schema for the film table can be found on the Pagila database website.

DROP TABLE IF EXISTS film;
CREATE TABLE film (
  title text NOT NULL,
  rating int,
  length int
);
INSERT INTO film(title, length, rating) VALUES
  ('ATTRACTION NEWTON', 83, 5),
  ('CHRISTMAS MOONSHINE', 150, 7),
  ('DANGEROUS UPTOWN', 121, 4),
  ('KILL BROTHERHOOD', 54, 3),
  ('HALLOWEEN NUTS', 47, 5),
  ('HOURS RAGE', 122, 7),
  ('PIANIST OUTFIELD', 136, 7),
  ('PICKUP DRIVING', 77, 3),
  ('INDEPENDENCE HOTEL', 157, 7),
  ('PRIVATE DROP', 106, 4),
  ('SAINTS BRIDE', 125, 3),
  ('FOREVER CANDIDATE', 131, 7),
  ('MILLION ACE', 142, 5),
  ('SLEEPY JAPANESE', 137, 4),
  ('WRATH MILE', 176, 7),
  ('YOUTH KICK', 179, 7),
  ('CLOCKWORK PARADISE', 143, 5);

The query below uses the REGR_COUNT() function to count the number of rows where both rating and length are not null:

SELECT
    REGR_COUNT(rating, length) AS NonNullPairsCount
FROM film;

By running the above query, we will get the following output:

 nonnullpairscount 
-------------------
                17
(1 row)