Jettison – a JSON parser

Author Topic: Jettison – a JSON parser  (Read 9482 times)

Jettison-Circuit is now pretty much done.
It allows you to simplify two-way communication (using TCPObjects) that uses JSON.

Code: https://github.com/Portify/jettison-circuit/blob/master/jettison-circuit.cs
README (includes "tutorial" and reference): https://github.com/Portify/jettison-circuit/blob/master/README.md

Here's an example put together from the README.



Setting up:

Code: [Select]
new TCPObject( MyTCPObject );
MyTCPObject.circuit = JettisonCircuit( MyTCPObject );

function MyTCPObject::onLine( %this, %line )
{
    %this.circuit.process( %line );
}



Subscribing functions to "channels":

Code: [Select]
// This will react to {'type': 'foo', ...}

function myHandlerFunction( %data )
{
    echo( "got foo! here's the JSObject: " @ %data );
    return false;
}

MyTCPObject.circuit.subscribe( "foo", "myHandlerFunction" );



Subscribing methods to "channels" and preventing Jettison-Circuit from deleting the parse output:

Code: [Select]
function MyTCPObject::myHandlerMethod( %this, %data )
{
    echo( "got foo! here's the JSObject: " @ %data );
    schedule( 500, 0, "doThisLater", %data );

    return true;
}

function doThisLater( %data )
{
    echo( "here we are, half a second later, still got the JSObject" );
    JSObject.echoDump();

    echo( "deleting it now though" );
    JSObject.killTree();
}

MyTCPObject.subscribe( "foo", MyTCPObject, "myHandlerMethod" );



Three different ways of sending data:

Code: [Select]
// Not specifying any data. This will just use {} as a default.
MyTCPObject.circuit.send( "foo" );

// The above is equivalent to the following.
MyTCPObject.send( "{\"type\": \"foo\"}\r\n" );

Code: [Select]
// Just passing raw serialized JSON.
MyTCPObject.circuit.send( "foo", "{\"size\": 538.2}" );

// The above is equivalent to the following.
MyTCPObject.send( "{\"type\": \"foo\", \"size\": 538.2}\r\n" );

Code: [Select]
// This time, create a JSObject through json_loads and pass it through.
%data = json_loads( "{\"size\": 538.2}" );
MyTCPObject.circuit.send( "foo", %data );

// The above is equivalent to the following.
MyTCPObject.send( "{\"type\": \"foo\", \"size\": 538.2}\r\n" );

This seems like a really weird approach to parsing JSON - did you use something as a reference for this?

This seems like a really weird approach to parsing JSON - did you use something as a reference for this?

I got the concept of using a "scan_once" function and having individual functions for each data structure that then return where they left off from Python's standard json library.

Nullable informed me of a problem that I didn't think of: If you have the output of multiple parses existing, where one, for instance, defines a JSArray with the ID 6400, and another references the number '6400', 6400.json would be true even if you're testing the number. When I have time, I'll change it to represent all values as a JS* object (e.g. JSNumber, JSString, ..) and implement type testing that way. I'll also provide a fast option to the parser which stops this behavior.

Nullable informed me of a problem that I didn't think of: If you have the output of multiple parses existing, where one, for instance, defines a JSArray with the ID 6400, and another references the number '6400', 6400.json would be true even if you're testing the number. When I have time, I'll change it to represent all values as a JS* object (e.g. JSNumber, JSString, ..) and implement type testing that way. I'll also provide a fast option to the parser which stops this behavior.

https://github.com/Portify/jettison/commit/4f57edb3fe57713bd253d4a0c350d0d50bc74946

Updated the OP to a more recent version of the README.



Also, to clarify on the fast option:

Jettison has two "modes". I like to call them object mode (slow) and data mode (fast).
By default, object mode is used. Setting fast to true will use data mode instead.

In object mode, all data is represented by an object.
In data mode, only objects and arrays are represented as objects.

Here's an interesting little idea: If you store the object ID with a symbol after it - maybe a \c0 - then any decent brown townysis will see it's a string, not a number; you can then obviously say it's really an object when brown townysing it assuming you use a central function for type brown townysis. You should probably be using expandEscape and collapseEscape when storing and retrieving strings (if you're not already - I didn't check), so an escape sequence like this won't occur in them, ever, and can be used to differentiate a number from an object.

Due to Torque's weird number and object handling, this is still a valid integer for math and a valid object ID for OOP, but at brown townysis time you can easily differentiate between a number and an object ID (mFloor(%i) $= %i returns false for it, if that's how you're checking integers; and it doesn't contain a . so it couldn't be misconstrued as a float either)

Here's an interesting little idea: If you store the object ID with a symbol after it - maybe a \c0 - then any decent brown townysis will see it's a string, not a number; you can then obviously say it's really an object when brown townysing it assuming you use a central function for type brown townysis. You should probably be using expandEscape and collapseEscape when storing and retrieving strings (if you're not already - I didn't check), so an escape sequence like this won't occur in them, ever, and can be used to differentiate a number from an object.

Due to Torque's weird number and object handling, this is still a valid integer for math and a valid object ID for OOP, but at brown townysis time you can easily differentiate between a number and an object ID (mFloor(%i) $= %i returns false for it, if that's how you're checking integers; and it doesn't contain a . so it couldn't be misconstrued as a float either)

Sounds like a good idea, I'll look into doing this for the "data mode" and make it default. The slow "object mode" will still be available as an option (for when exact type testing is a necessity, e.g. differentiating between a blank string and null, etc. Although, I could store different "symbol suffixes" for different types, I might think about that afterwards).

You should probably be using expandEscape and collapseEscape when storing and retrieving strings (if you're not already - I didn't check)

Yeah, I'm already doing this.
« Last Edit: November 16, 2012, 06:16:14 AM by Port »

Pretty much done with a new version. No "slow" or "fast" mode.

Objects are represented as a ScriptObject with the class JSObject.
Arrays are represented as a ScriptObject with the class JSArray.
Strings are represented as a string with \c0 appended (I should probably do something about this).
Numbers are represented as a string with \c1 appended.
Booleans are represented as either "0" or "1".
Null is represented as "".

-snip-



Quote from: Proposed js_type Documentation
==> echo( json_loads( "[]" ) );
9795


There is now an array with ID 9795.
To prove this, we can type check it.

==> echo( js_type( "9795" ) );
array


So what if we manually set something to the number 9795?
It'll end up as "array" too.

==> $a = 9795;
==> echo( js_type( $a ) );
array


Yup, it thinks it's an array. This is where the 2nd argument to js_type becomes useful.
With two arguments, js_type lets you set the type of a value.

==> $a = js_type( $a, "number" );
==> echo( js_type( $a ) );
number


Or how about trying that same thing, but as a string?

==> $a = js_type( $a, "string" );
==> echo( js_type( $a ) );
string


There's some things you just can't do though.

==> $a = js_type( $a, "bool" );
==> echo( js_type( $a ) );
array


==> $a = js_type( $a, "object" );
==> echo( js_type( $a ) );
array

« Last Edit: November 16, 2012, 07:37:22 AM by Port »

Is there a helper function for getting the "value" of an "js value" (pass through if obj, array, boolean or null, strip the type information for string and number)?

Is there a helper function for getting the "value" of an "js value" (pass through if obj, array, boolean or null, strip the type information for string and number)?

Currently, everything except for objects and arrays are already represented as the "value" itself. To get rid of any "type data", one thing you can do is to set it to any type that isn't "number" and "string". Even using an unexistant type as the type would work (js_type(value, "none")).



I'll be pushing this version in just a moment.
Actually, first, what would you guys prefer?

Objects having a suffix to indicate that they're objects or numbers/strings having a suffix to indicate the opposite?
« Last Edit: November 16, 2012, 08:09:30 AM by Port »

Objects makes more sense, since with a string there's something 'extra' in your string you didn't put there and with a number you'll lose the suffix if you modify it. Odds are you won't be modifying the object reference.

Objects makes more sense, since with a string there's something 'extra' in your string you didn't put there and with a number you'll lose the suffix if you modify it. Odds are you won't be modifying the object reference.

True. That's also what I went with.


Updated the documentation in the OP to a more recent version of the README.md from the repository.