Skip to content

Add URL structure capture features (--capture-paths, --capture-subdomains, --capture-domain, --capture-url-structure)#133

Merged
digininja merged 5 commits intodigininja:masterfrom
Umair-khurshid:feature/capture-url-structure
Apr 4, 2026
Merged

Add URL structure capture features (--capture-paths, --capture-subdomains, --capture-domain, --capture-url-structure)#133
digininja merged 5 commits intodigininja:masterfrom
Umair-khurshid:feature/capture-url-structure

Conversation

@Umair-khurshid
Copy link
Copy Markdown
Contributor

Adds new options to capture URL structure components and include them in the wordlist:

  • --capture-paths: Extract path components from URLs
  • --capture-subdomains: Extract subdomain components from hostnames
  • --capture-domain: Extract main domain from URLs
  • --capture-url-structure: Combined flag that enables all three

Example

# Single flag captures everything (domain, subdomains, paths)
cewl --capture-url-structure -d 2 https://github.com

# Use --offsite to discover subdomains across the entire site
cewl --capture-subdomains --offsite -d 3 https://toscrape.com
# This will follow links to quotes.toscrape.com, books.toscrape.com, etc. and capture their subdomains

Testing

Tested on toscrape.com, github.com, wikipedia.org

Subdomain capture works with --offsite to discover linked subdomains
Works with existing options: -c, -v, -m, -x, -w

Closes #6

Added options to capture URL paths, subdomains, and domains in the wordlist generation process.
Added option to capture URL structure including domain, paths, and subdomains.
Add URL structure capture options to README.
Comment thread cewl.rb Outdated
parsed = URI.parse(url)
if parsed.host
host_parts = parsed.host.split('.')
# Remove TLD (last part) and domain (second last part)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about for domains that have two levels of TLD, for example test.co.uk?

Comment thread cewl.rb Outdated
parsed = URI.parse(url)
if parsed.host
host_parts = parsed.host.split('.')
if host_parts.length >= 2
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as other, needs to handle test.co.uk

Copy link
Copy Markdown
Contributor Author

@Umair-khurshid Umair-khurshid Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I have added the public_suffix gem to correctly identify registrable domains based on the Public Suffix List, with fallback logic for edge cases.

@digininja
Copy link
Copy Markdown
Owner

Looks good, just needs updating to handle multi-level TLDs such as test.co.uk.

- Use public_suffix gem for proper domain parsing
- Update extract_domain and extract_subdomain_components
- Add fallback logic for parsing failures
@digininja digininja merged commit 5bfce3d into digininja:master Apr 4, 2026
1 check failed
@digininja
Copy link
Copy Markdown
Owner

Sorry, got distracted and forgot to go through this. All looks good, thanks.

@digininja
Copy link
Copy Markdown
Owner

I've just checked and the docker image is still working correctly as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Add Domain/Subdomain/Path to wordlist

2 participants