Add URL structure capture features (--capture-paths, --capture-subdomains, --capture-domain, --capture-url-structure)#133
Merged
digininja merged 5 commits intodigininja:masterfrom Apr 4, 2026
Conversation
Added options to capture URL paths, subdomains, and domains in the wordlist generation process.
Added option to capture URL structure including domain, paths, and subdomains.
Add URL structure capture options to README.
digininja
reviewed
Mar 27, 2026
| parsed = URI.parse(url) | ||
| if parsed.host | ||
| host_parts = parsed.host.split('.') | ||
| # Remove TLD (last part) and domain (second last part) |
Owner
There was a problem hiding this comment.
What about for domains that have two levels of TLD, for example test.co.uk?
digininja
reviewed
Mar 27, 2026
| parsed = URI.parse(url) | ||
| if parsed.host | ||
| host_parts = parsed.host.split('.') | ||
| if host_parts.length >= 2 |
Owner
There was a problem hiding this comment.
Same as other, needs to handle test.co.uk
Contributor
Author
There was a problem hiding this comment.
Update: I have added the public_suffix gem to correctly identify registrable domains based on the Public Suffix List, with fallback logic for edge cases.
Owner
|
Looks good, just needs updating to handle multi-level TLDs such as test.co.uk. |
- Use public_suffix gem for proper domain parsing - Update extract_domain and extract_subdomain_components - Add fallback logic for parsing failures
Owner
|
Sorry, got distracted and forgot to go through this. All looks good, thanks. |
Owner
|
I've just checked and the docker image is still working correctly as well. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds new options to capture URL structure components and include them in the wordlist:
--capture-paths: Extract path components from URLs--capture-subdomains: Extract subdomain components from hostnames--capture-domain: Extract main domain from URLs--capture-url-structure: Combined flag that enables all threeExample
Testing
Tested on toscrape.com, github.com, wikipedia.org
Subdomain capture works with --offsite to discover linked subdomains
Works with existing options: -c, -v, -m, -x, -w
Closes #6