URLs can't be recognized by regex, it's an impossible task. Just do simpler URLs, let's get rid of all characters, only ASCII visible symbols allowed and that's it.
Discussion
i'm not talking about recognising blabla.com i'm saying a correctly formed URL, very easy to write a regex for them, but i don't have to because the stdlib in golang already has a full URL parsing library in net/url
i think you haven't read the RFC for it because the accepted ciphers in a URL are very clear, as is the structure of proto://someth-ing.some_thing.sometld/path/to/thing?something and the occasional same thing except with a # based section identifier is pretty clear and simple and the only character in standard printable ascii not permitted in parameter key and values are mostly just the & separator
i'm quite sure there is a javascratch library for doing this correctly according to every possible nonsense that might show up in an RFC compliant implementation
the most efficient solution will be a state machine parser, but they are a little bit of work to get right... i basically wrote such things for json last year but part of the trick to why it saves time is it assumes it's minified and a couple other things, there is a whole swathe of invalid constructions it would let past but miraculously nobody bothers to mangle them anyhow (i don't even handle whitespace except to skip it until the next expected type of token)
also, you are wrong, wrong wrong wrong... here's an example of two regexes i wrote to catch filesystem references both absolute and relative:
this is relative:, it assumes no ./ or anything, just the first filename to start:
^((([a-zA-Z@0-9-_.]+/)+([a-zA-Z@0-9-_.]+)):([0-9]+))
this is one that expects a space before a relative, i had to handle both because one is for the condition of start of line and one is the condition of in a typical stack trace (iirc)
[ ]((([a-zA-Z@0-9-_.]+/)+([a-zA-Z@0-9-_.]+)):([0-9]+))
and this is an absolute filepath:
([/](([a-zA-Z@0-9-_.]+/)+([a-zA-Z@0-9-_.]+)):([0-9]+))
as you can see starts with a / - and yes, i don't have one for the explicit relative because nobody does that except i have to do that with certain go tool commands like run and build and install
You just proved me right in your last paragraph. It works because the standard is not really a standard. The actual standard is simpler than what is described in the RFC, and it should be even simpler. No one really needs braces in URLs, so let's just stop using them.
"lets just stop" who? using them, oh you mean jimmy wales and his "free" propaganda rag wikipedia? good luck with that
also because your response is so hyperbolic i'm gonna assume you are not taking this seriously
I'll do my part by breaking his URLs in my software if I remember that.
this is exactly why nostr:npub1ye5ptcxfyyxl5vjvdjar2ua3f0hynkjzpx552mu5snj3qmx5pzjscpknpr made applesauce, but i would mention also that @hodlbod has written correct URL handling that doesn't choke on jimmy's crung URLs (and not only jimmy, you will find there is other idiots who think that being allowed to use braces means they can use braces by the RFC)
i jfc reading this, is this some kind of humor?
and i was amazed to learn that he also follows this idiot idea to not recognise brackets in a space bound URL, or intelligently separate the brackets from a fucking matching URL that starts with , you know, a fucking sentinel`http` really
guys, i dunno what to say about your abilities to reason at this point, i'm just looking at this and i'm baffled how this byzantine short circuit happens in your brains