Author Topic: RTB Development  (Read 381184 times)

I have absolutely no idea what point you're trying to make here. Why are you talking about AJAX and PHP?
You said that you cannot send special characters with Torque. The same goes with AJAX. Therefor, if you convert the symbols to integers, you will get a really thought job to convert them back to characters in PHP.

The same goes with AJAX.

You can't send special characters with pies either, why not mention those too? I don't see how PHP or AJAX are relevant to a Java-based XMPP server and a TorqueScript-based XMPP client.

I made this set of functions a while back for converting numbers to special characters and back, might be helpful.

The only problem is that it does only take 255 characters, which is too small for multi-byte characters.

This is something I'd do in C++ for sure, passing every character to be sent through a function like that would be a bit of a nightmare in terms of performance.

This is something I'd do in C++ for sure, passing every character to be sent through a function like that would be a bit of a nightmare in terms of performance.
So it's a good thing that it should only need to be done once per startup (and assumedly each time you change your name).

Not like you need to do it every 33ms or anything.

No - All XML CData will be UTF-8 encoded as per XMPP RFCs which means all chat messages too.

Then if you were running only characters that can't be sent under UTF-8 encoding through this (perhaps placing them in their appropriately escaped sequences and then using whatever that torque function to convert escaped backslashes into their sequences was) and sending them that way, that would help, perhaps?

For example:
Before: "Hello wörld"
After: "Hello w\xF6rld"

Edit: Assumedly sending them this way also, so just checking a stripos(%char,"abcdefghijklmnopqrstuvwxyz012 34567890-_=+[]{}\\|;:'\",.<>/?!@#$%^&*()") would work, right? You still need to loop the string but that shouldn't be too bad in and of itself.

I admit that my knowledge of it is limited but my basic understanding is that any character on a key on a standard US keyboard is supported by UTF-8, and I believe that's everything that could be entered. (`~ are omitted due to console)

Edit2 - did I get the stripos args backwards? I think I did. God dammit.
« Last Edit: February 02, 2010, 07:09:57 AM by M »

Trust me, this is not feasible in TorqueScript.

Trust me, this is not feasible in TorqueScript.
If the character translation just forgets up performance, don't bother. People will wind up changing their names so they can use RTB. If not, that's their loss.

It's not actually possible in TorqueScript currently. ASCII uses a single byte (up to 255 characters) and UTF-8 uses 2 or more bytes if the character is > 127 and TorqueScript can't handle a multi-byte character and tries to split it into two single characters. It has to be done in C++.
« Last Edit: February 02, 2010, 07:22:02 AM by Ephialtes »

Trust me, this is not feasible in TorqueScript.

My test string is:
Code: [Select]
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
abcdefghijklmnopqrstuvwxyz{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£
¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãä
åæçèéêëìíîïðñòóôõö÷øùúûüýþÿ!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLM
NOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ
‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒ
ÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ!"#$%&'()*+,-./0123456789
:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½
¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
This is far, far worse than any string you would have to deal with. It's 669 characters long and something like 80% of that is modified by the encoding and thus run through Space's charToNum and base10toHex functions.
The linebreaks aren't in it while it's running, that's to stop it stretching the screen. :cookieMonster:

My code:
Code: [Select]
function UTFescape(%str)
{
for(%i=0;%i<strlen(%str);%i++)
{
%char = getsubstr(%str,%i,1);
if(stripos("abcdefghijklmnopqrstuvwxyz012 34567890-_=+[]{}\\|;:'\",.<>/?!@#$%^&*()",%char) == -1)
{
%num = base10toHex(charToNum(%char));
%char = "\\x" @ %num;
}
if(%char $= "\\")
{
%char = "\\\\";
}
%newstr = %newstr @ %char;
}
return %newstr;
}
// All of space's stuff here, it's very long and thus omitted
I am running that on my test string every 33ms. There's a tiny bit of lag from it, I'm losing something like 3 frames per second or so. Running that on each message you send isn't going to cause problems, and really you could always just provide an option to strip those characters instead of encoding them so people can reduce that lag if they want to.

Edit: That becomes
Code: [Select]
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_
\x60abcdefghijklmnopqrstuvwxyz{|}\x7E\x7F\x80\x81\x82\x83\x84\x85\x86\x87\x88
\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B
\x9C\x9D\x9E\x9F\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE
\xAF\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF\xC0\xC1
\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF\xD0\xD1\xD2\xD3\xD4
\xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xDF\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7
\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB
\xFC\xFD\xFE\xFF
Three times over. Once again, it's much worse than anything anyone will ever be sending, and would that not be sendable through the server?

Edit: You can translate it back using collapseescape(); Also added something to escape the backslashes in there since I missed that and it results in collapsing escapes to remove the backslash entirely.
« Last Edit: February 02, 2010, 07:40:00 AM by M »

It's not actually possible in TorqueScript currently. ASCII uses a single byte (up to 255 characters) and UTF-8 uses 2 or more bytes if the character is > 127 and TorqueScript can't handle a multi-byte character and tries to split it into two single characters. It has to be done in C++.

???

One billion edits. I don't claim to be as knowledgeable as you about this but if your server doesn't support sending of every character on a US keyboard it obviously needs fixing, and there's nothing in the escaped string I can't type with a single keypress. Excluding the shift key, of course.
« Last Edit: February 02, 2010, 07:44:14 AM by M »

Your escaped code is incorrect - How are you expecting it to represent something like U+00D6? UTF-8 will only use a single byte for anything up to 7F (127) beyond that, it uses a second byte.