Work with measures

This document shows you how to define and query measures on your graphs. You can use measures to ensure that aggregations are performed correctly across joins.

Overview

A measure is an aggregate property defined within the PROPERTIES clause of a node or edge table. Measures are defined using the MEASURE keyword and one of the following supported aggregate functions:

  • SUM
  • AVG
  • COUNT
  • COUNT(DISTINCT)
  • MIN
  • MAX

Measures define their aggregation in reference to the KEY of the node or edge table on which they're defined. This means that when you query a measure, the aggregation is performed correctly even if the underlying table is joined in a way that causes rows to be duplicated.

You can't reference a property defined by a measure in a GQL query. Instead, you access measures by calling the GRAPH_EXPAND TVF to create a flattened table representation of your graph. This function doesn't accept all types of graphs. For more information about which graphs form valid input, see the input limitations.

You can call the AGG function on output from the GRAPH_EXPAND TVF to aggregate the properties according to the aggregation function you defined in the measure.

Define measures

You define measures within the PROPERTIES clause of a node or edge table definition by using the MEASURE() keyword around a supported aggregate function.

The following example creates a dataset called university and tables for colleges, departments, and courses:

CREATE SCHEMA IF NOT EXISTS university;

CREATE OR REPLACE TABLE university.College (
  college_id INT64 PRIMARY KEY NOT ENFORCED,
  college_name STRING
);

CREATE OR REPLACE TABLE university.Department (
  dept_id INT64 PRIMARY KEY NOT ENFORCED,
  dept_name STRING,
  college_id INT64,
  budget FLOAT64,
  FOREIGN KEY (college_id) REFERENCES university.College(college_id) NOT ENFORCED
);

CREATE OR REPLACE TABLE university.Course (
  course_id INT64 PRIMARY KEY NOT ENFORCED,
  course_name STRING,
  dept_id INT64,
  credits INT64,
  FOREIGN KEY (dept_id) REFERENCES university.Department(dept_id) NOT ENFORCED
);

INSERT INTO university.College (college_id, college_name)
VALUES (101, 'College of Engineering'),
      (102, 'College of Arts');

INSERT INTO university.Department (dept_id, dept_name, college_id, budget)
VALUES (1001, 'Computer Science', 101, 500000),
      (1002, 'Mechanical Engineering', 101, 400000),
      (1003, 'Fine Arts', 102, 200000),
      (1004, 'Research', 101, 50000);

INSERT INTO university.Course (course_id, course_name, dept_id, credits)
VALUES (1, 'Intro to CS', 1001, 3),
       (2, 'Data Structures', 1001, 4),
       (3, 'Thermodynamics', 1002, 3),
       (4, 'Oil Painting', 1003, 2);

The following statement creates a graph called SchoolGraph that defines measures on some of the properties in the Department and Course nodes. You must provide an alias for properties defined by a measure.

CREATE OR REPLACE PROPERTY GRAPH university.SchoolGraph
  NODE TABLES (
    university.College
      KEY(college_id)
      PROPERTIES(college_id, college_name),
    university.Department
      KEY(dept_id)
      PROPERTIES(dept_id, dept_name, college_id,
        budget OPTIONS(description="Department budget in USD"),
        MEASURE(SUM(budget)) AS total_budget),
    university.Course
      KEY(course_id)
      PROPERTIES(
        course_id,
        course_name,
        credits,
        dept_id,
        MEASURE(AVG(credits)) AS avg_credits,
        MEASURE(SUM(credits)) AS total_credits,
        MEASURE(COUNT(course_id)) AS course_count)
  )
  EDGE TABLES (
    university.Department AS CollegeDept
      SOURCE KEY (college_id) REFERENCES College (college_id)
      DESTINATION KEY (dept_id) REFERENCES Department (dept_id),
    university.Course AS DeptCourse
      SOURCE KEY (dept_id) REFERENCES Department (dept_id)
      DESTINATION KEY (course_id) REFERENCES Course (course_id)
  );

A visualization of the SchoolGraph graph.

The total_budget measure is defined as MEASURE(SUM(budget)). The measure locks the aggregation to the KEY, which is dept_id.

Understanding overcounting

When you join tables, data is repeated every time there is a one-to-many relationship in your data. For example, if you join the Course, Department, and College tables, a department with multiple courses appears in multiple rows in the output:

SELECT
  college_name,
  dept_name,
  course_name,
  budget
FROM university.Course
LEFT JOIN university.Department
  ON Course.dept_id = Department.dept_id
LEFT JOIN university.College
  ON Department.college_id = College.college_id;

/*------------------------+------------------------+-----------------+----------+
 | college_name           | dept_name              | course_name     | budget   |
 +------------------------+------------------------+-----------------+----------+
 | College of Engineering | Computer Science       | Intro to CS     | 500000.0 |
 | College of Engineering | Computer Science       | Data Structures | 500000.0 |
 | College of Engineering | Mechanical Engineering | Thermodynamics  | 400000.0 |
 | College of Arts        | Fine Arts              | Oil Painting    | 200000.0 |
 +------------------------+------------------------+-----------------+----------*/

If you try to compute the total budget per college using SUM(budget), the Computer Science department's budget is counted twice. You can avoid this problem by querying the Department table directly, but this approach doesn't work if you want to compute multiple aggregations from different tables that contribute to your joined data. The following section shows how measures solve this problem.

Query a graph with measures

You can query a graph with measures using GQL, but your query can't directly reference any property defined by a measure. For example, the following query references nodes that have measure properties defined, but doesn't use or return any of the measure fields:

GRAPH university.SchoolGraph
MATCH (c:College)-[]-(d:Department)-[]->(course:Course)
RETURN c.college_name, d.dept_name, course.course_name;

/*------------------------+------------------------+-----------------+
 | college_name           | dept_name              | course_name     |
 +------------------------+------------------------+-----------------+
 | College of Engineering | Computer Science       | Intro to CS     |
 | College of Engineering | Computer Science       | Data Structures |
 | College of Engineering | Mechanical Engineering | Thermodynamics  |
 | College of Arts        | Fine Arts              | Oil Painting    |
 +------------------------+------------------------+-----------------*/

Work with measures

To work with measures, you use the GRAPH_EXPAND table-valued function (TVF) to query your graph as a single flattened table. Columns in the output table are derived from the properties defined in the graph for each node and edge table. To prevent naming conflicts, columns are named by concatenating the node or edge table label and the property name—for example, Course_course_name or Department_total_budget. The following query shows some sample output from the GRAPH_EXPAND TVF:

SELECT
  College_college_name,
  Department_dept_name,
  Department_budget,
  Course_course_name
FROM
  GRAPH_EXPAND("university.SchoolGraph");

/*------------------------+------------------------+-------------------+--------------------+
 | College_college_name   | Department_dept_name   | Department_budget | Course_course_name |
 +------------------------+------------------------+-------------------+--------------------+
 | College of Engineering | Computer Science       | 500000.0          | Intro to CS        |
 | College of Engineering | Computer Science       | 500000.0          | Data Structures    |
 | College of Engineering | Mechanical Engineering | 400000.0          | Thermodynamics     |
 | College of Arts        | Fine Arts              | 200000.0          | Oil Painting       |
 +------------------------+------------------------+-------------------+--------------------+

The GRAPH_EXPAND function produces the flattened table by applying a series of LEFT JOIN operations to node and edge tables. A valid input graph must have exactly one root node table, which is a table whose KEY value doesn't appear in any other table. Data that isn't reachable from the root node table through a series of joins doesn't appear in the output. In the previous example, the Course table is the root node table. The Research department is omitted from the output because it doesn't have any courses.

You can't directly select a column for a property defined by a measure. Instead, you must wrap them in the AGG() function. This function ensures that the aggregation defined on the measures is performed exactly once per key.

The following query simultaneously calculates the total budget and total number of courses for each college:

SELECT
  College_college_name,
  AGG(Department_total_budget) AS college_budget,
  AGG(Course_course_count) AS total_courses
FROM
  GRAPH_EXPAND("university.SchoolGraph")
GROUP BY
  College_college_name;

/*------------------------+----------------+---------------+
 | College_college_name   | college_budget | total_courses |
 +------------------------+----------------+---------------+
 | College of Engineering | 900000.0       | 3             |
 | College of Arts        | 200000.0       | 1             |
 +------------------------+----------------+---------------*/

View GRAPH_EXPAND schema

To view the schema of the table returned by the GRAPH_EXPAND function without calling the function, use the BQ.SHOW_GRAPH_EXPAND_SCHEMA system procedure:

DECLARE schema STRING DEFAULT '';
CALL BQ.SHOW_GRAPH_EXPAND_SCHEMA('university.SchoolGraph', schema);
SELECT schema;

This procedure populates the schema variable with the name, type, and mode of each column. It also indicates whether the property referenced by the column is a measure, and lists any description or synonyms that you defined on it.

What's next