Work with measures
This document shows you how to define and query measures on your graphs. You can use measures to ensure that aggregations are performed correctly across joins.
Overview
A measure is an aggregate property defined within the PROPERTIES clause
of a node or edge table. Measures are defined using the MEASURE keyword and
one of the following supported aggregate functions:
SUMAVGCOUNTCOUNT(DISTINCT)MINMAX
Measures define their aggregation in reference to the KEY of the node or
edge table on which they're defined. This means that when you query a measure,
the aggregation is performed correctly even if the underlying table is joined in
a way that causes rows to be duplicated.
You can't reference a property defined by a measure in a GQL query. Instead,
you access measures by calling the
GRAPH_EXPAND TVF
to create a flattened table representation of your graph. This function doesn't
accept all types of graphs. For more information about
which graphs form valid input, see the input limitations.
You can call the
AGG function
on output from the GRAPH_EXPAND TVF to aggregate
the properties according to the aggregation function you defined in
the measure.
Define measures
You define measures within the
PROPERTIES clause
of a node or edge table definition by using the MEASURE() keyword around a
supported aggregate function.
The following example creates a dataset called university and tables
for colleges, departments, and courses:
CREATE SCHEMA IF NOT EXISTS university;
CREATE OR REPLACE TABLE university.College (
college_id INT64 PRIMARY KEY NOT ENFORCED,
college_name STRING
);
CREATE OR REPLACE TABLE university.Department (
dept_id INT64 PRIMARY KEY NOT ENFORCED,
dept_name STRING,
college_id INT64,
budget FLOAT64,
FOREIGN KEY (college_id) REFERENCES university.College(college_id) NOT ENFORCED
);
CREATE OR REPLACE TABLE university.Course (
course_id INT64 PRIMARY KEY NOT ENFORCED,
course_name STRING,
dept_id INT64,
credits INT64,
FOREIGN KEY (dept_id) REFERENCES university.Department(dept_id) NOT ENFORCED
);
INSERT INTO university.College (college_id, college_name)
VALUES (101, 'College of Engineering'),
(102, 'College of Arts');
INSERT INTO university.Department (dept_id, dept_name, college_id, budget)
VALUES (1001, 'Computer Science', 101, 500000),
(1002, 'Mechanical Engineering', 101, 400000),
(1003, 'Fine Arts', 102, 200000),
(1004, 'Research', 101, 50000);
INSERT INTO university.Course (course_id, course_name, dept_id, credits)
VALUES (1, 'Intro to CS', 1001, 3),
(2, 'Data Structures', 1001, 4),
(3, 'Thermodynamics', 1002, 3),
(4, 'Oil Painting', 1003, 2);
The following statement creates a graph called SchoolGraph that defines
measures on some of the properties in the Department and Course nodes.
You must provide an alias for properties defined by a measure.
CREATE OR REPLACE PROPERTY GRAPH university.SchoolGraph
NODE TABLES (
university.College
KEY(college_id)
PROPERTIES(college_id, college_name),
university.Department
KEY(dept_id)
PROPERTIES(dept_id, dept_name, college_id,
budget OPTIONS(description="Department budget in USD"),
MEASURE(SUM(budget)) AS total_budget),
university.Course
KEY(course_id)
PROPERTIES(
course_id,
course_name,
credits,
dept_id,
MEASURE(AVG(credits)) AS avg_credits,
MEASURE(SUM(credits)) AS total_credits,
MEASURE(COUNT(course_id)) AS course_count)
)
EDGE TABLES (
university.Department AS CollegeDept
SOURCE KEY (college_id) REFERENCES College (college_id)
DESTINATION KEY (dept_id) REFERENCES Department (dept_id),
university.Course AS DeptCourse
SOURCE KEY (dept_id) REFERENCES Department (dept_id)
DESTINATION KEY (course_id) REFERENCES Course (course_id)
);
The total_budget measure is defined as MEASURE(SUM(budget)). The measure
locks the aggregation to the KEY, which is dept_id.
Understanding overcounting
When you join tables, data is repeated every time there is a one-to-many
relationship in your data. For example, if you join the Course, Department,
and College tables, a department with multiple courses appears in multiple
rows in the output:
SELECT
college_name,
dept_name,
course_name,
budget
FROM university.Course
LEFT JOIN university.Department
ON Course.dept_id = Department.dept_id
LEFT JOIN university.College
ON Department.college_id = College.college_id;
/*------------------------+------------------------+-----------------+----------+
| college_name | dept_name | course_name | budget |
+------------------------+------------------------+-----------------+----------+
| College of Engineering | Computer Science | Intro to CS | 500000.0 |
| College of Engineering | Computer Science | Data Structures | 500000.0 |
| College of Engineering | Mechanical Engineering | Thermodynamics | 400000.0 |
| College of Arts | Fine Arts | Oil Painting | 200000.0 |
+------------------------+------------------------+-----------------+----------*/
If you try to compute the total budget per college using SUM(budget),
the Computer Science department's budget is counted twice. You can avoid this
problem by querying the Department table directly, but this approach doesn't
work if you want to compute multiple aggregations from different tables that
contribute to your joined data. The following section shows how measures solve
this problem.
Query a graph with measures
You can query a graph with measures using GQL, but your query can't directly reference any property defined by a measure. For example, the following query references nodes that have measure properties defined, but doesn't use or return any of the measure fields:
GRAPH university.SchoolGraph
MATCH (c:College)-[]-(d:Department)-[]->(course:Course)
RETURN c.college_name, d.dept_name, course.course_name;
/*------------------------+------------------------+-----------------+
| college_name | dept_name | course_name |
+------------------------+------------------------+-----------------+
| College of Engineering | Computer Science | Intro to CS |
| College of Engineering | Computer Science | Data Structures |
| College of Engineering | Mechanical Engineering | Thermodynamics |
| College of Arts | Fine Arts | Oil Painting |
+------------------------+------------------------+-----------------*/
Work with measures
To work with measures, you use the
GRAPH_EXPAND
table-valued function (TVF)
to query your graph as a single flattened table.
Columns in the output table are derived from the properties defined in the
graph for each node and edge table. To prevent naming conflicts, columns are
named by concatenating the node or edge table label and the property name—for
example, Course_course_name or Department_total_budget. The following query
shows some sample output from the GRAPH_EXPAND TVF:
SELECT
College_college_name,
Department_dept_name,
Department_budget,
Course_course_name
FROM
GRAPH_EXPAND("university.SchoolGraph");
/*------------------------+------------------------+-------------------+--------------------+
| College_college_name | Department_dept_name | Department_budget | Course_course_name |
+------------------------+------------------------+-------------------+--------------------+
| College of Engineering | Computer Science | 500000.0 | Intro to CS |
| College of Engineering | Computer Science | 500000.0 | Data Structures |
| College of Engineering | Mechanical Engineering | 400000.0 | Thermodynamics |
| College of Arts | Fine Arts | 200000.0 | Oil Painting |
+------------------------+------------------------+-------------------+--------------------+
The GRAPH_EXPAND function produces the flattened table by applying a series
of LEFT JOIN operations to node and edge tables. A valid input graph must
have exactly one root node table, which is a table whose KEY value doesn't
appear in any other table. Data that isn't reachable from the root node table
through a series of joins doesn't appear in the output. In the previous
example, the Course table is the root node table. The Research department
is omitted from the output because it doesn't have any courses.
You can't directly select a column for a property defined by a measure.
Instead, you must wrap them in the
AGG() function.
This function ensures that the aggregation defined on the measures is performed
exactly once per key.
The following query simultaneously calculates the total budget and total number of courses for each college:
SELECT
College_college_name,
AGG(Department_total_budget) AS college_budget,
AGG(Course_course_count) AS total_courses
FROM
GRAPH_EXPAND("university.SchoolGraph")
GROUP BY
College_college_name;
/*------------------------+----------------+---------------+
| College_college_name | college_budget | total_courses |
+------------------------+----------------+---------------+
| College of Engineering | 900000.0 | 3 |
| College of Arts | 200000.0 | 1 |
+------------------------+----------------+---------------*/
View GRAPH_EXPAND schema
To view the schema of the table returned by the GRAPH_EXPAND function
without calling the function, use the
BQ.SHOW_GRAPH_EXPAND_SCHEMA system procedure:
DECLARE schema STRING DEFAULT '';
CALL BQ.SHOW_GRAPH_EXPAND_SCHEMA('university.SchoolGraph', schema);
SELECT schema;
This procedure populates the schema variable with the name, type, and mode
of each column. It also indicates whether the property referenced by the column
is a measure, and lists any description or synonyms that you defined on it.
What's next
- Learn how to create and query a property graph.
- Learn about graph schemas.