feat(linter): add rule noUntrustedLicenses#9474
Conversation
🦋 Changeset detectedLatest commit: 12f4d84 The changes in this PR will be included in the next version bump. This PR includes changesets to release 13 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Merging this PR will not alter performance
Comparing Footnotes
|
Parser conformance results onjs/262
jsx/babel
markdown/commonmark
symbols/microsoft
ts/babel
ts/microsoft
|
WalkthroughAdds a new nursery lint rule NoUntrustedLicenses that analyses package.json dependency licenses using SPDX parsing and configurable trust rules. Wires ProjectLayout into the JSON analysis services and query infrastructure so rules can locate dependency manifests in node_modules (including hoisted lookups). Introduces SPDX expression parsing and trust evaluation in biome_package, exposes trust helpers, adds rule options and tests, and updates test harnesses and service wiring to propagate an optional ProjectLayout. Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
crates/biome_project_layout/src/project_layout.rs (1)
74-96: Pin the layout once for the whole walk.The rustdoc is nicely clear, but the implementation re-pins the concurrent map on every ancestor hop via
get_node_manifest_for_package(). Holding one pin here matches the neighbouring lookup helpers and avoids a bit of needless churn.Suggested tidy-up
pub fn get_dependency_manifest( &self, project_dir: &Utf8Path, dependency_name: &str, ) -> Option<PackageJson> { - for ancestor in project_dir.ancestors() { - let dep_path = ancestor.join("node_modules").join(dependency_name); - if let Some(manifest) = self.get_node_manifest_for_package(&dep_path) { - return Some(manifest); - } - } - None + let packages = self.0.pin(); + project_dir.ancestors().find_map(|ancestor| { + let dep_path = ancestor.join("node_modules").join(dependency_name); + packages + .get(dep_path.as_path()) + .and_then(|data| data.node_package.as_ref()) + .and_then(|node_package| node_package.manifest.as_ref()) + .cloned() + }) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_project_layout/src/project_layout.rs` around lines 74 - 96, The loop in get_dependency_manifest re-acquires the concurrent layout pin on every ancestor by calling get_node_manifest_for_package repeatedly; instead, acquire the layout pin once before the for ancestor loop (e.g. obtain the pinned guard from self.layout or whatever pin API the concurrent map exposes), then either call a variant of get_node_manifest_for_package that takes a reference to that pinned guard or pass the guard into the loop so each iteration reuses the same pin; update get_dependency_manifest to hold that single pin for the whole walk and adjust get_node_manifest_for_package signature if needed to accept the pinned guard.crates/biome_json_analyze/tests/spec_tests.rs (1)
163-175: Hardenproject_layout_for_json_testfor real-worldnode_moduleslayouts.The current one-level scan skips scoped packages (
node_modules/@scope/pkg), andok()?in the loop can bail out the whole helper on a single bad entry. A continue-based walk (plus scoped package handling) would make these tests less brittle.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_json_analyze/tests/spec_tests.rs` around lines 163 - 175, The loop that builds the test ProjectLayout should be hardened: replace the current for-entry usage that uses ok()? and thus can return early, with a continue-based walk that skips individual broken entries instead of bailing the whole helper, and add handling for scoped packages (directories whose name starts with '@') by iterating their child package directories as well; specifically, update the loop that reads std::fs::read_dir(node_modules_dir) to (a) on any read_dir, try_from, or read_to_string error just continue rather than using ok()? to return, (b) when an entry is a directory whose file_name starts with '@', iterate its entries and process each child package the same way, and (c) continue to call project_layout.insert_node_manifest(pkg_dir, manifest) only for successfully parsed package.json files.crates/biome_json_analyze/src/lint/nursery/no_untrusted_licenses.rs (1)
337-346: The missing-license diagnostic needs a “what now?” line.This explains the problem and the risk, but not the next step. A short remediation note would make the warning much more actionable.
Based on learnings, "Implement the `diagnostic` function to convert signals into RuleDiagnostic instances with informative messages. Always follow the three pillars: explain WHAT the error is, explain WHY it's triggered, and tell the user WHAT to do"✏️ Suggested tweak
.note(markup! { - "Dependencies without a license field may have unknown legal restrictions." + "Dependencies without a license field may have unknown legal restrictions. Review the dependency manually, replace it, or contact the maintainer before using it." }),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_json_analyze/src/lint/nursery/no_untrusted_licenses.rs` around lines 337 - 346, The missing-license diagnostic created in the Reason::Missing arm (using RuleDiagnostic::new with state.range and markup referencing dep and group) explains the issue and risk but lacks remediation; update this diagnostic to include a short actionable "what now?" line by chaining a .help or an additional .note that tells users how to fix it (e.g., add a license field to the dependency's manifest or contact the package maintainer, or pin to a known-licensed alternative) so the RuleDiagnostic produced for Reason::Missing gives WHAT is wrong, WHY it matters, and a clear WHAT TO DO next.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_json_analyze/src/lint/nursery/no_untrusted_licenses.rs`:
- Around line 166-174: DEPENDENCY_GROUPS currently includes "bundleDependencies"
and "bundledDependencies" but the parser only uses as_json_object_value(), so
those array/boolean fields are silently skipped; remove those two entries from
DEPENDENCY_GROUPS to avoid false confidence, or alternatively add handling where
the code calls as_json_object_value() (the function that iterates dependency
groups) to accept array and boolean types: if the value is an array, iterate its
string items as dependency names; if it's a boolean, treat true as "all bundled"
(or explicit policy) and false as none. Update either DEPENDENCY_GROUPS or the
as_json_object_value() usage accordingly (referencing DEPENDENCY_GROUPS and the
as_json_object_value() call to locate the change).
In `@crates/biome_package/src/license/expression.rs`:
- Around line 35-38: The parser currently accepts any identifier after a WITH
token; update the WITH-handling code in this module (the function that parses
primary/license-with expressions—look for methods named parse_with,
parse_exception, or the WITH branch inside parse_expression/parse_primary) to
validate the exception identifier against the official SPDX Exceptions List and
return a parse error if it is not present; you can either (a) import or embed
the SPDX Exceptions set (or use an existing spdx crate lookup) and call
contains(exception_id) before constructing the WITH node, or (b) if you prefer
to block unsupported semantics now, reject any WITH expressions by returning an
explicit error from that parsing branch (ensuring callers treat it as a parse
failure rather than silently dropping the exception).
---
Nitpick comments:
In `@crates/biome_json_analyze/src/lint/nursery/no_untrusted_licenses.rs`:
- Around line 337-346: The missing-license diagnostic created in the
Reason::Missing arm (using RuleDiagnostic::new with state.range and markup
referencing dep and group) explains the issue and risk but lacks remediation;
update this diagnostic to include a short actionable "what now?" line by
chaining a .help or an additional .note that tells users how to fix it (e.g.,
add a license field to the dependency's manifest or contact the package
maintainer, or pin to a known-licensed alternative) so the RuleDiagnostic
produced for Reason::Missing gives WHAT is wrong, WHY it matters, and a clear
WHAT TO DO next.
In `@crates/biome_json_analyze/tests/spec_tests.rs`:
- Around line 163-175: The loop that builds the test ProjectLayout should be
hardened: replace the current for-entry usage that uses ok()? and thus can
return early, with a continue-based walk that skips individual broken entries
instead of bailing the whole helper, and add handling for scoped packages
(directories whose name starts with '@') by iterating their child package
directories as well; specifically, update the loop that reads
std::fs::read_dir(node_modules_dir) to (a) on any read_dir, try_from, or
read_to_string error just continue rather than using ok()? to return, (b) when
an entry is a directory whose file_name starts with '@', iterate its entries and
process each child package the same way, and (c) continue to call
project_layout.insert_node_manifest(pkg_dir, manifest) only for successfully
parsed package.json files.
In `@crates/biome_project_layout/src/project_layout.rs`:
- Around line 74-96: The loop in get_dependency_manifest re-acquires the
concurrent layout pin on every ancestor by calling get_node_manifest_for_package
repeatedly; instead, acquire the layout pin once before the for ancestor loop
(e.g. obtain the pinned guard from self.layout or whatever pin API the
concurrent map exposes), then either call a variant of
get_node_manifest_for_package that takes a reference to that pinned guard or
pass the guard into the loop so each iteration reuses the same pin; update
get_dependency_manifest to hold that single pin for the whole walk and adjust
get_node_manifest_for_package signature if needed to accept the pinned guard.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e19aca81-31f8-4c60-b79a-04df278f67b3
⛔ Files ignored due to path filters (12)
Cargo.lockis excluded by!**/*.lockand included by**crates/biome_configuration/src/analyzer/linter/rules.rsis excluded by!**/rules.rsand included by**crates/biome_configuration/src/generated/linter_options_check.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_diagnostics_categories/src/categories.rsis excluded by!**/categories.rsand included by**crates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/allow_option/package.json.snapis excluded by!**/*.snapand included by**crates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/deny_option/package.json.snapis excluded by!**/*.snapand included by**crates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/deprecated_license/package.json.snapis excluded by!**/*.snapand included by**crates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/fsf_required/package.json.snapis excluded by!**/*.snapand included by**crates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/ignore_deprecated/package.json.snapis excluded by!**/*.snapand included by**crates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/invalid/package.json.snapis excluded by!**/*.snapand included by**crates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/osi_required/package.json.snapis excluded by!**/*.snapand included by**crates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/valid/package.json.snapis excluded by!**/*.snapand included by**
📒 Files selected for processing (30)
crates/biome_json_analyze/Cargo.tomlcrates/biome_json_analyze/benches/json_analyzer.rscrates/biome_json_analyze/src/lib.rscrates/biome_json_analyze/src/lint/nursery/no_untrusted_licenses.rscrates/biome_json_analyze/src/services/mod.rscrates/biome_json_analyze/src/services/project_layout.rscrates/biome_json_analyze/tests/spec_tests.rscrates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/allow_option/package.jsoncrates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/allow_option/package.options.jsoncrates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/deny_option/package.jsoncrates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/deny_option/package.options.jsoncrates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/deprecated_license/package.jsoncrates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/fsf_required/package.jsoncrates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/fsf_required/package.options.jsoncrates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/ignore_deprecated/package.jsoncrates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/ignore_deprecated/package.options.jsoncrates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/invalid/package.jsoncrates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/osi_required/package.jsoncrates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/osi_required/package.options.jsoncrates/biome_json_analyze/tests/specs/nursery/noUntrustedLicenses/valid/package.jsoncrates/biome_package/src/lib.rscrates/biome_package/src/license/expression.rscrates/biome_package/src/license/generated.rscrates/biome_package/src/license/mod.rscrates/biome_project_layout/src/project_layout.rscrates/biome_rule_options/src/lib.rscrates/biome_rule_options/src/no_untrusted_licenses.rscrates/biome_service/src/file_handlers/json.rsxtask/codegen/src/generate_license.rsxtask/rules_check/src/lib.rs
💤 Files with no reviewable changes (2)
- xtask/codegen/src/generate_license.rs
- crates/biome_package/src/license/generated.rs
| /// | ||
| /// Supports simple identifiers (`MIT`), `OR` (disjunction), `AND` (conjunction), | ||
| /// parenthesized sub-expressions, and `WITH` exceptions (the exception is ignored | ||
| /// for trust checking — only the base license ID matters). |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In the SPDX license expression grammar, must the identifier after WITHbe a valid SPDX exception identifier, and should an expression likeApache-2.0 WITH NotARealException be considered invalid?
💡 Result:
Yes. In the SPDX license expression ABNF, the token after WITH is a license-exception-id, defined as a short-form license exception identifier from the SPDX Exceptions List (Annex A.2). Therefore, an expression like Apache-2.0 WITH NotARealException is not a valid SPDX license expression under the grammar. If the needed exception is not on the SPDX Exception List, the SPDX spec says you should instead use a single LicenseRef-... to represent the whole license terms (including that exception). [1][2]
Sources:
[1] (spdx.github.io)
[2] (spdx.github.io)
Citations:
- 1: https://spdx.github.io/spdx-spec/v2.2.2/SPDX-license-expressions/
- 2: https://spdx.github.io/spdx-spec/v2.3/SPDX-license-expressions/
Validate WITH exception identifiers against the SPDX Exceptions List.
Per the SPDX spec, the identifier after WITH must be from the official SPDX Exceptions List. Currently, Apache-2.0 WITH NotARealException is silently accepted and treated as just Apache-2.0, which violates the SPDX grammar. Either validate exceptions explicitly or reject WITH expressions until properly modelled.
Also applies to: 252–264
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/biome_package/src/license/expression.rs` around lines 35 - 38, The
parser currently accepts any identifier after a WITH token; update the
WITH-handling code in this module (the function that parses primary/license-with
expressions—look for methods named parse_with, parse_exception, or the WITH
branch inside parse_expression/parse_primary) to validate the exception
identifier against the official SPDX Exceptions List and return a parse error if
it is not present; you can either (a) import or embed the SPDX Exceptions set
(or use an existing spdx crate lookup) and call contains(exception_id) before
constructing the WITH node, or (b) if you prefer to block unsupported semantics
now, reject any WITH expressions by returning an explicit error from that
parsing branch (ensuring callers treat it as a parse failure rather than
silently dropping the exception).
| /// Additional license identifiers to trust, beyond valid SPDX identifiers. | ||
| /// | ||
| /// Useful for custom or proprietary licenses that are not part of the SPDX | ||
| /// standard but are acceptable in your project. | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| pub allow: Option<Box<[Box<str>]>>, | ||
|
|
||
| /// License identifiers to explicitly deny, even if they are valid SPDX identifiers. | ||
| /// | ||
| /// Use this to block specific licenses that your project or organization can't use (e.g., | ||
| /// copyleft licenses in a proprietary project). | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| pub deny: Option<Box<[Box<str>]>>, |
There was a problem hiding this comment.
I think it would be good to give the ability to provide an allowlist. The options currently only allow for a denylist.
Here's how I think it should work:
Basically, it should be possible to configure the rule:
- with an allowlist that denies things not in the list
- or a denylist that allows things not in the list
if both are provided, see if its allowed in the allowlist, and then check the denylist to deny it. As in, the denylist has more authority than the allowlist.
There was a problem hiding this comment.
Line 13 is the allowlist. If both are provided, deny takes precedence (it's also documented I think)
9563015 to
40b1eb2
Compare
|
No changeset? |
|
@Netail added |
0201f1f to
1d3cc84
Compare
1d3cc84 to
0c24bc9
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.changeset/forty-women-move.md:
- Line 12: Fix the typo in the .changeset description for the `requireFsfLibre`
flag: change the word "liceses" to "licenses" in the sentence
"`requireFsfLibre`: whether the liceses need to be approved by the [Free
Software Foundation](https://www.gnu.org/licenses/license-list.html)." so it
reads "whether the licenses need to be approved...".
- Line 11: Fix the typo in the .changeset description: change "neend" to "need"
in the line describing `requireOsiApproved` so the sentence reads
"`requireOsiApproved`: whether the licenses need to be approved by the [Open
Source Initiative](https://opensource.org/)."
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 075eeddb-541e-49da-9071-e9c8b0e3779f
⛔ Files ignored due to path filters (3)
Cargo.lockis excluded by!**/*.lockand included by**crates/biome_configuration/src/analyzer/linter/rules.rsis excluded by!**/rules.rsand included by**crates/biome_configuration/src/generated/linter_options_check.rsis excluded by!**/generated/**,!**/generated/**and included by**
📒 Files selected for processing (1)
.changeset/forty-women-move.md
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Summary
This is another lint rule I wanted to ship for a long time, and now that we have the infrastructure, here it is!
This lint rule catches dependencies that use untrusted libraries.
The code gen for license is already part of the analyser, and I bet no one knew it was there :D (that's how old this code is, and for how long I wanted it).
I vibe-coded the PR and checked all the code (rewrote it different times).
Test Plan
Added various tests
Docs
Added docs to the rule