Conversation
|
Hmm, so this PR adds support for If we want to support |
|
Oh, you're right. That's wrong. It was a bit late. |
Protocol-relative links (PRL), also known as protocol-relative URLs (PRURL), are URLs that have no protocol specified. For example, `//example.com` will use the protocol of the current page, typically HTTP or HTTPS. These URLs commonly occur in the wild, specifically on pages that support both http and https protocols. Since the URLs without a protocol belong to the class of relative URLs, it is up to the user to decide what protocol to use when parsing these URLs. Example configuration for the url crate: https://docs.rs/url/2.2.1/url/#base-url
| fn scan(&self, s: &str, trigger_index: usize) -> Option<Range<usize>> { | ||
| let (proto, offset) = match self.trigger { | ||
| '/' => ("//", 2), | ||
| _ => ("://", 3), |
There was a problem hiding this comment.
If you like, we can make the trigger an enum type. Then we can use an exhaustive match here:
let (proto, offset) = match self.trigger {
Trigger::Slash => ("//", 2),
Trigger::Colon => ("://", 3),
}There was a problem hiding this comment.
Ah I just went ahead and added it. Think that's cleaner now. Let me know if I should revert the change. ✌️
|
Should be fixed now. |
| /// Users should not exhaustively match this enum, because more triggers | ||
| /// may be added in the future. | ||
| #[derive(Debug, PartialEq)] | ||
| pub enum Trigger { |
There was a problem hiding this comment.
Can you make this pub(crate)? Want to make it clear at this point that it's not part of the API.
| fn find_start(&self, s: &str) -> Option<usize> { | ||
| // Match protocol relative URLs (`//example.org`) | ||
| // See https://stackoverflow.com/a/15146073/270334 | ||
| if s.is_empty() && self.trigger == Trigger::Slash { |
There was a problem hiding this comment.
Hmm does this mean in abc //example.org we wouldn't match //example.org? Can you add a test case for this?
Also needs some more negative test cases, e.g. something like foo//bar should not match, but e.g. "//foo.bar should.
| @@ -188,13 +201,12 @@ impl Default for LinkFinder { | |||
|
|
|||
| impl<'t> Links<'t> { | |||
| fn new(text: &'t str, url: bool, email: bool, email_domain_must_have_dot: bool) -> Links<'t> { | |||
There was a problem hiding this comment.
I think we should have an option (disabled by default) for the new functionality, something like protocol_relative_urls(true). Normally I wouldn't want to match those kinds of links in plain text.
You can see e.g. GitHub not matching it either: //github.com
|
Just checking back on this one. Still needs some more changes (and resolving conflicts). |
|
Oh yeah. Didn't find the time to look into it, heh. Will set myself a reminder for next week. |
|
Should we close this and try again at a later point in time? linkify 0.9.0 made some changes to the codebase, so it's probably easier to start from a clean slate. |
|
Yep! It should be much simpler to do that now as the necessary bits of logic are now separate from |
Protocol-relative links (PRL), also known as protocol-relative URLs (PRURL), are URLs that have no protocol specified. For example,
//example.comwill use the protocol of the current page, typically HTTP or HTTPS.These URLs commonly occur in the wild, specifically on pages that support both HTTP and HTTPS protocols.
Since the URLs without a protocol belong to the class of relative URLs, it is up to the user to decide what protocol to use when parsing these URLs.
Example configuration for the url crate: https://docs.rs/url/2.2.1/url/#base-url
It would be great to support those in linkify, as I discovered quite a few cases while working on recursion in lychee.
(lycheeverse/lychee#165)