Jaded - Java Deserialization for Rust

Java has a much maligned (for good reason) serialization system built into the standard library. The output is a binary stream mapping the full object hierarchy and the relations between them.

The stream also includes definitions of classes and their hierarchies (super classes etc). The full specification is defined here.

In any new application there are probably better ways to serialize data with fewer security risks but there are cases where a legacy application is writing stuff out and we want to read it in again. If we want to read it in a separate application it'd be good if we weren't bound to Java.

I had one such application and rather than write more Java to interact with it, I wrote this.

Example

In Java

java import java.io.FileOutputStream; import java.io.ObjectOutputStream; import java.io.Serializable; public class Demo implements Serializable { private static final long serialVersionUID = 1L; private String message; private int i; public Demo(String message, int count) { this.message = message; this.i = count; } public static void main(String[] args) throws Exception { Demo d = new Demo("helloWorld", 42); try (FileOutputStream fos = new FileOutputStream("demo.obj", false); ObjectOutputStream oos = new ObjectOutputStream(fos);) { oos.writeObject(d); } } }

From Rust

```rust use std::fs::File; use jaded::{Parser, Result};

fn main() -> Result<()> { let sample = File::open("demo.obj").expect("File missing"); let mut parser = Parser::new(sample)?; println!("Read Object: {:#?}", parser.read()?); Ok(()) } ```

Output from Rust

Read Object: Object( Object( ObjectData { class: "Demo", fields: { "i": Primitive( Int( 42, ), ), "message": JavaString( "helloWorld", ), }, annotations: [], }, ), )

Conversion to Rust types

For most uses cases, the raw types read from the stream are not very ergonomic to work with. For ease of use, types can implement FromValue, and can then be read directly from the stream.

Taking the same Java Demo class defined and written above, the implementation could look something like ```rust struct Demo { message: String, i: i32, }

impl FromValue for Demo { fn fromvalue(value: &Value) -> ConversionResult { match value { Value::Object(data) => { let message = data.getfieldas("message")?; let i = data.getfieldas("i")?; Ok(Demo{message, i}), }, Value::Null => Err(ConversionError::NullPointerException), _ => Err(ConversionError::InvalidType("object"), } } } Demo objects can then read directly from the stream let d: Demo = parser.readas()?; ```

Limitations

Ambiguous serialization

Unfortunately, there are limits to what we can do without the original code that created the serial byte stream. The protocol linked above lists four types of object. One of which, classes that implement java.lang.Externalizable and use PROTOCOLVERSION1 (not been the default since v1.2), are not readable by anything other than the class that wrote them as their data is nothing more than a stream of bytes.

Of the remaining three types we can only reliably deserialize two.

'Normal' classes that implement java.lang.Serializable without having a writeObject method

These can be read as shown above
Classes that implement Externalizable and use the newer PROTOCOLVERSION2

These can be read, although their data is held fully by the annotations fields of the ObjectData struct and the get_field method only returns None.
Serializable classes that implement writeObject

These objects are more difficult. The spec above suggests that they have their fields written as 'normal' classes and then have optional annotations written afterwards. In practice this is not the case and the fields are only written if the class calls defaultWriteObject as the first call in their writeObject method. This is mentioned as a requirement in the spec so we can assume that this is correct for clases in the standard library but it is something to be aware of if user classes are being deserialized.

The consequence of this is that once we have found a class that we can't read, it is difficult to get back on track as it requires picking out the marker signifying the start of the next object from the sea of custom data.

In the future, there will hopefully be a method do define how customised classes should be read so that at least within a certain application where expected class types are known beforehand, all classes can be read.

It may also be possible to 'guess' how classes were written by making some assumptions and hoping that custom data doesn't look like stream markers. This method would be unreliable though and as such will only ever be an opt in process.

Future plans

Add ability to register custom classes and what fields to expect. For the common classes in the standard library, these could be built in to this library and custom classes from users' code could be added where they're being read.
Deserialize to custom structs. At the moment the process of getting useful data out of a derserialized stream is awkward and in most situations the data types would be known beforehand. Having something along the lines of a FromJava trait that would allow a readObject<T: FromJava>() method would make the process more straight forward.
- This is partially complete. It is still a fairly manual process and in future this may use a derive macro
Possible tie in with Serde. I've not yet looked into how the serde data model works but this seems like it would be a useful way of accessing Java data.

State of development

Very much a work in progress at the moment. I am writing this for another application I am working on so I imagine there will be many changes in the functionality and API at least in the short term as the requirements become apparent. As things settle down I hope things will become more stable.

Contributions

As this project it is still very much in a pre-alpha state, I imagine things being quite unstable for a while. That said, if you notice anything obviously broken or have a feature that you think would be useful that I've missed entirely, do open issues. I'd avoid opening PRs until it's been discussed in an issue as the current repo state may lag behind development.