Jump to content

Broken URLs


T0SHEA

Recommended Posts

Ever discover that a URL you have entered in a Waymark's URL box doesn't work? The on page URL boxes, such as the "Wikipedia Url:" massage the URL, supposedly to handle "unsafe" or non-standard characters, but the Waymarking code apparently doesn't catch all instances of non-standard characters. We came across this last night in a Wiki Waymark we were reviewing. When included in the long description the URL worked just fine, but when entered in the "Wikipedia Url: " field the URL broke. Strangely, the only times we've come across this problem has been with Wiki URLs.

 

Strangely, the URL broke on a "single quote", supposedly a "safe" character. Also in the URL was the word "Musée" which was encoded by Waymarking as "Mus%C3%A9e". I would have encoded it as "Mus%E9e", but "Mus%C3%A9e" works, for some reason. The single quote wasn't encoded and broke the URL, which it shouldn't have done. In any event, this is an explanation of how to fix these little glitches.

 

In the example above, the "%C3%A9" and "%E9" are the enoding for the characters they replace in the URL. The percent sign is there to tell the parser that a hex value follows and the two characters following the percent sign (%) represent the hexadecimal value of a given character in the ASCII table of characters. Have a peek at the table right now to get an idea of what we're talking about.

 

You should see that the Hexadecimal code (HEX) for "é" is "E9" - which is how I would have encoded the é in the word "Musée", the result being "Mus%E9e". How this - "Mus%C3%A9e" works I'll never know as that encoding should result in "Musée".

 

Anyhow, to implement a fix, just find the character(s) which break(s) your URL in the ASCII table at the above URL and substitute with the Hex value from the table preceeded by the percent sign, EXAMPLE - "Musée" = "Mus%E9e".

 

ASCII, incidentally, stands for American Standard Code for Information Interchange and has been around about as long as there have been computers, being changed, extended and modified through the years.

 

EDIT: While we're on the subject, if you have ever wanted to insert a special character, like the copyright symbol (©) or the Euro symbol (€) in a waymark, or anywhere else for that matter, just copy the code from the "HTML Number" or "HTML Name" column of the ASCII table.

 

Examples - Euro symbol - HTML Number = (Ampersand)#128; - HTML Name = (Ampersand)euro;

 

Can't get these to escape properly, the Ampersand and parentheses represents the Ampersand character - just copy the text from the appropriate line of either the HTML Number or HTML Name column of the ASCII table, your choice. Couldn't be simpler.

 

Either works in an HTML document. Just remember that the semicolon is part of the code and is necessary.

Edited by BK-Hunters
Link to comment

Thank you for the useful tips.

 

Perhaps you have the skills to create another workaround. URLs entered on the waymark page for a French Benchmark no longer work after the waymark is created. The .pdf files are correct but broken once Waymarking gets involved.

 

WMTCP3

Link to comment

Thank you for the useful tips.

 

Perhaps you have the skills to create another workaround. URLs entered on the waymark page for a French Benchmark no longer work after the waymark is created. The .pdf files are correct but broken once Waymarking gets involved.

 

WMTCP3

 

http://geodesie.ign.fr/fiches/pdf/6237101.pdf

This is what I see.

 

http://geodesie.ign.fr/fiches/pdf/0924603.pdf

Here's one that works, which means that the pdf number (6237101) has to be incorrect.

 

There are no bad characters in the URL, so I have no idea what may have gone wrong with it. Can you send me a copy of a good, working URL or even post it here.

 

I'm really slooooow... I just realized that's mine. I believe I tested the URL when I submitted it and it worked then. Very Strange.

Edited by BK-Hunters
Link to comment

You should see that the Hexadecimal code (HEX) for "é" is "E9" - which is how I would have encoded the é in the word "Musée", the result being "Mus%E9e". How this - "Mus%C3%A9e" works I'll never know as that encoding should result in "Musée".

"Mus%E9e" is ISO 8859-1, which is good enough for the languages predominantly used in the Americas and Western Europe (except for Icelandic, Welsh, Esperanto, Unified Canadian Aboriginal Syllabics and a few others).

 

"Mus%C3%A9e" is UTF-8, which can encode almost any script ever used on this planet.

 

Both encodings are correct, but a computer needs to know which one to use. Sometimes they guess wrong and then the result looks broken like "Musée".

Link to comment

You should see that the Hexadecimal code (HEX) for "é" is "E9" - which is how I would have encoded the é in the word "Musée", the result being "Mus%E9e". How this - "Mus%C3%A9e" works I'll never know as that encoding should result in "Musée".

"Mus%E9e" is ISO 8859-1, which is good enough for the languages predominantly used in the Americas and Western Europe (except for Icelandic, Welsh, Esperanto, Unified Canadian Aboriginal Syllabics and a few others).

 

Thanks for reminding me of UTF-8. I look at it so seldom that I forget it even exists. I just bookmarked the table on my bookmarks bar to keep it in my face.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...