| Blockland Forums > Modification Help |
| RTB Development |
| << < (171/889) > >> |
| Ephialtes:
It's not actually possible in TorqueScript currently. ASCII uses a single byte (up to 255 characters) and UTF-8 uses 2 or more bytes if the character is > 127 and TorqueScript can't handle a multi-byte character and tries to split it into two single characters. It has to be done in C++. |
| M:
--- Quote from: Ephialtes on February 02, 2010, 07:04:56 AM ---Trust me, this is not feasible in TorqueScript. --- End quote --- My test string is: --- Code: ---!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_` abcdefghijklmnopqrstuvwxyz{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£ ¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãä åæçèéêëìíîïðñòóôõö÷øùúûüýþÿ!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLM NOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒ ÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ!"#$%&'()*+,-./0123456789 :;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ €‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½ ¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ --- End code --- This is far, far worse than any string you would have to deal with. It's 669 characters long and something like 80% of that is modified by the encoding and thus run through Space's charToNum and base10toHex functions. The linebreaks aren't in it while it's running, that's to stop it stretching the screen. :cookieMonster: My code: --- Code: ---function UTFescape(%str) { for(%i=0;%i<strlen(%str);%i++) { %char = getsubstr(%str,%i,1); if(stripos("abcdefghijklmnopqrstuvwxyz012 34567890-_=+[]{}\\|;:'\",.<>/?!@#$%^&*()",%char) == -1) { %num = base10toHex(charToNum(%char)); %char = "\\x" @ %num; } if(%char $= "\\") { %char = "\\\\"; } %newstr = %newstr @ %char; } return %newstr; } // All of space's stuff here, it's very long and thus omitted --- End code --- I am running that on my test string every 33ms. There's a tiny bit of lag from it, I'm losing something like 3 frames per second or so. Running that on each message you send isn't going to cause problems, and really you could always just provide an option to strip those characters instead of encoding them so people can reduce that lag if they want to. Edit: That becomes --- Code: ---!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_ \x60abcdefghijklmnopqrstuvwxyz{|}\x7E\x7F\x80\x81\x82\x83\x84\x85\x86\x87\x88 \x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B \x9C\x9D\x9E\x9F\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE \xAF\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF\xC0\xC1 \xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF\xD0\xD1\xD2\xD3\xD4 \xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xDF\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7 \xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB \xFC\xFD\xFE\xFF --- End code --- Three times over. Once again, it's much worse than anything anyone will ever be sending, and would that not be sendable through the server? Edit: You can translate it back using collapseescape(); Also added something to escape the backslashes in there since I missed that and it results in collapsing escapes to remove the backslash entirely. |
| Ephialtes:
--- Quote from: Ephialtes on February 02, 2010, 07:18:41 AM ---It's not actually possible in TorqueScript currently. ASCII uses a single byte (up to 255 characters) and UTF-8 uses 2 or more bytes if the character is > 127 and TorqueScript can't handle a multi-byte character and tries to split it into two single characters. It has to be done in C++. --- End quote --- ??? |
| M:
One billion edits. I don't claim to be as knowledgeable as you about this but if your server doesn't support sending of every character on a US keyboard it obviously needs fixing, and there's nothing in the escaped string I can't type with a single keypress. Excluding the shift key, of course. |
| Ephialtes:
Your escaped code is incorrect - How are you expecting it to represent something like U+00D6? UTF-8 will only use a single byte for anything up to 7F (127) beyond that, it uses a second byte. |
| Navigation |
| Message Index |
| Next page |
| Previous page |