Author Topic: RTB Development  (Read 380707 times)

Your escaped code is incorrect - How are you expecting it to represent something like U+00D6? UTF-8 will only use a single byte for anything up to 7F (127) beyond that, it uses a second byte.
Code: [Select]
\xD6It's not leaving the character as a single character. It's converting it into the escape sequence that Torque can use collapseescape() to convert back into the character.

You do know that \x is the unicode escape character right? This is still not using UTF-8 characer encoding so it would constitute invalid XML.

You do know that \x is the unicode escape character right? This is still not using UTF-8 characer encoding so it would constitute invalid XML.
Escape character. As in, it's not a unicode character of itself, it's a backslash and an X followed by a hexadecimal number.

\xD6 is unicode. This needs to be turned into UTF-8. I don't know why I'm still having this discussion.

\xD6 is unicode. This needs to be turned into UTF-8. I don't know why I'm still having this discussion.
Then send it as \\xD6.
If you can't send a backslash followed by an X then a D then a 6, I am seeing some serious problems in the future of RTB Connect.

Edit: Snipped frustration.

\xD6 does not constitute valid UTF-8 encoding. Java will not even interpret \xD6 as anything - that's just invalid. \x does not exist in Java.

\xD6 does not constitute valid UTF-8 encoding. Java will not even interpret \xD6 as anything - that's just invalid. \x does not exist in Java.
I am saying to you, to send a backslash. Followed by an X. Followed by the sequence. As individual characters. If you have to, escape the backslash so it's its own character, not part of a sequence. Then, when the Torque client receives this sequence with its individual backslashes and other characters, you can collapse the escape sequences, and voila. Unicode characters, and all you ever sent to the server was backslashes and letters and numbers. No umlauts or accents.

Then people using other clients like Pidgin or Psi are going to see people with names like "Ge\xD6rge" - that's not acceptable.

Then people using other clients like Pidgin or Psi are going to see people with names like "Ge\xD6rge" - that's not acceptable.
And why would these clients not have some way to convert unicode characters as well? I'm sure that any widely-used client has some form of unicode support via XMPP. That is, however, a valid point.

Because the XMPP specification is for UTF-8 only. There is no "oh but use Unicode or EBCIDIC if you feel like it" clause.

Quote from: XMPP RFC 3920
11.5.  Character Encoding

   Implementations MUST support the UTF-8 (RFC 3629 [UTF-8])
   transformation of Universal Character Set (ISO/IEC 10646-1 [UCS2])
   characters, as required by RFC 2277 [CHARSET].  Implementations MUST
   NOT attempt to use any other encoding.

Whoops. Forgot that I didn't respond to that.


Well basically from how I read that, by using XMPP you are consenting that you will not use any non-UTF-8 encoding in any way. I wasn't aware of this part of the specification, but that basically looks like as long as you're using XMPP you will have to block anyone who uses unicode characters in their name, and strip them from messages, etc. - with no option, as long as you're using XMPP, for translation of these characters in any way, escaped or otherwise.

I wasn't aware that this was part of the specification. If you'd posted that earlier, the entire discussion wouldn't have happened, as by using XMPP you are effectively agreeing (by my understanding) not to use any non-UTF-8 characters, escaped or otherwise.

that basically looks like as long as you're using XMPP you will have to block anyone who uses unicode characters in their name, and strip them from messages, etc. - with no option, as long as you're using XMPP, for translation of these characters in any way, escaped or otherwise.

You need to do some work with character encoding, UTF-8 can represent all the characters that unicode can. There is no compromise, UTF-8 just allows backwards compatibility with ASCII where standard Unicode doesn't.

If you'd posted that earlier, the entire discussion wouldn't have happened

When you decide to start proposing solutions for an issue you were never even invited to give your opinion on it is generally considered polite to understand (or atleast research) the situation correctly before leading people on some sort of wild goose chase.

Anyway, I'm going to use the extra dev time (RTB v4 will have to be released after v15) to look into sharing save files as well as getting a head start on the new website development.

When you decide to start proposing solutions for an issue you were never even invited to give your opinion on it is generally considered polite to understand (or atleast research) the situation correctly before leading people on some sort of wild goose chase.

Anyway, I'm going to use the extra dev time (RTB v4 will have to be released after v15) to look into sharing save files as well as getting a head start on the new website development.
Good idea. I would also stop wasting time talking to people about character encoding. I'd rather let them feel ignored and carry on to more important things. If they want to know that badly they can look it up.

Now, dont' flame me... but why AFTER v15?

Now, dont' flame me... but why AFTER v15?
He's already explained it.

There needs to be engine changes.

And he doesn't mean when v16 comes out. He just means when v15 comes out, he'll work on getting it working.