Jump to content

SUBMITTED (22158) - [BUG] Non-latin character encoding in GPX files


TheKickers

Recommended Posts

Hello,

I have seen this discussed before, but I can't find it reported here, so here it goes:

If a cache title/description/hints etc. contains some non-ascii, international characters, these are encoded using the character entity reference (or hex escaped encodings, e.g. θ), although the page encoding directive specifies UTF-8 encoding, so plain UTF-8 would be OK. This is probably not a big deal for the webpage, as long as all major browsers are able to render this, and as far as I know, it is legal according to the standard.

But when the cache is exported to GPX format, this character encoding is left there, making it very hard to read on GPS devices. On my Garmin Oregon 450, most of the "special" characters are either left out, or displayed including the & encoding.

As an example, take for example GC1Q7G7 cache. In GPX file, the description starts like this:

CS: Možná nebude úplně jednoduché vyluštit

Seems I can't attach a screenshot here, so just an attempt to transcribe what is on the GPS device screen:

CS: Moná nebude úpln jednoduché vyluštit

 

My recommendation would be to use plain UTF-8 everywhere, even on the webpage, but the GPX is critical for me (and plenty of other who are not english native speakers).

Link to comment

Unfortunately, we cannot safely change this. Fixing this problem would require that both the short description and long description sections of the GPX file be changed to CDATA sections. This type of change to the file structure of the GPX file would cause issues on many devices since the CDATA section is not part of the specification of the GPX file format. At this time this is something we cannot fix.

Link to comment

Unfortunately, we cannot safely change this. Fixing this problem would require that both the short description and long description sections of the GPX file be changed to CDATA sections. This type of change to the file structure of the GPX file would cause issues on many devices since the CDATA section is not part of the specification of the GPX file format. At this time this is something we cannot fix.

Thank you for the answer. But I am not sure I understand the reason properly, can you, please, elaborate a bit deeper on this?

The encoding attribute on the XML declaration at the beginning of the gpx file is set to UTF-8 right now. As far as I know, this means that all texts in the file are to be expected in UTF-8. No need for a change to CDATA to store plain UTF-8.

If there are devices which can't cope with UTF-8 even with the UTF-8 attribute, then the users can filter these characters out as a post-processing step, there are tools out there for this. And these units should get fixed :-).

Or, even better, you can let the user choose the encoding when creating the PQ, or in user preferences. But then, the encoding attribute in gpx file should not be set to UTF-8, but e.g. ISO-8859-1, shouldn't it? :-).

Link to comment

Well, the short answer is that there is a difference between the encoding of the XML file and what is contained in between the tags in the file. The real issue here is that our database does not support Unicode and so the contents of the GPX file cannot contain Unicode. Because of this, any fix to the GPX file at this time would be only a band-aid used to cover this more central issue.

 

That said, we are in the midst of converting our database to use Unicode. Once that has been completed, the issue of non-Latin character encoding in GPX files will essentially correct itself. I have edited this thread to tie it to the core Unicode conversion rather than to the superficial GPX corrections that have been discussed in the past.

Link to comment

That said, we are in the midst of converting our database to use Unicode. Once that has been completed, the issue of non-Latin character encoding in GPX files will essentially correct itself. I have edited this thread to tie it to the core Unicode conversion rather than to the superficial GPX corrections that have been discussed in the past.

That's good to hear. Once the database is updated no changes to the GPX files would be needed as they are already Unicode with UTF-8 encoding.

 

Not sure why CDATA sections were mentioned as they have nothing to do with character encoding. All they do is let you include lots of <'s and &'s without having to escape them.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...