This is the first in a two-part series on using JSON with class hierarchies and interfaces whilst adhering to the OO principles of SOLID coding. It turns out that deserializing JSON documents into these more involved class models is not as straight forward as we would like, and requires some knowledge of how the JSON library will behave when presented with these structures.

 

As part of a Data Quality framework implementation it was required to persist data validation rules in JSON format, to be deserialized into a set of classes for implementing the required checks and alerting regarding the data. Following the Dependency Injection principle, we code against an interface rather than an implementation. So there are a number of interfaces for the various classes involved, and each of our implementing classes will subscribe to one or more of these interfaces. This allows easily extendible and maintainable code with a reduced possibility of breaking client classes using our validation library. Right, so that’s the general idea. In accordance with this the interface hierarchy below was developed, allowing an inheritance as well as composition (decorators etc.) approach to structure the code reuse.

 

image

As you can see, each data validation rule type inherits from a base interface which contains some common functionality. Each validation rule has a name, properties for validation rule execution row counts, a rule type and a threshold defined via the IDataRuleThreshold interface, which consists of a number of Threshold Types (Pass, Warn, Fail etc.) and accompanying Threshold Limit values to determine whether the returned row counts constitute a Failure, Warning etc.. All of these sit within the base interface IDataRule and are inherited by all derived interfaces. The various child interfaces for the rule types have some divergence however, containing specifics required for the exercising of the different data validations. For example the IDistinctDataRule rule type has a TargetField property but does not require a DataType property, whereas the DataType rule type does, as the latter requires the desired data type to check against as part of the rule conditions. Hence the need for the different interface definitions.

 

Each of the Data Validation rules is contained within a IDataValidationRuleSet, consisting of one or more rules. All rules target a specific object specified within the SourceObjectName property.

 

This provides a relatively straight-forward class hierarchy with which we can define and action our data validation rules. The actual implementation of the rules is intended to be target platform specific, allowing rules to be implemented on different database platforms via specific assemblies that inherit from the base rule classes.

 

For the project in question, we were targeting Hive for the data validation, and as such the following implementation hierarchy was developed.

 

image_thumb11

Loosely Coupling with Dependency Injection

 

Okay, so that’s the class hierarchy. Remembering that we are following good OO principles of coding our objects so that they refer to interfaces, such as is seen with the DataRuleBase class, which contains a Threshold property that is of type IDataRuleThreshold, and also a ThresholdLimitsExceeded property that is a list of IThresholdLimit types. The IDataRuleThreshold interface object is passed in via a constructor argument, thereby allowing us to decide at construction time the actual implementation of this interface that we want.

DataRuleBase

 

Within our implementation for Hive, the HiveDataTypeDataRule class, for example, also uses interfaces for the above properties and constructor arguments. As with DataRuleBase we can keep things loosely coupled with respect to the Threshold property until construction time.

 

HiveDataTypeDataRule

 

The constructor parameters include the interface object threshold and calls the base constructor (as well as setting a couple of properties specific to the derived class).

		[JsonConstructor]
		public HiveDataTypeDataRule(string name, DataRuleType ruleType, IDataRuleThreshold threshold, string targetField, DataType dataType)
			: base(name, ruleType, threshold)
		{
			TargetField = targetField;
			//set via the property so as to use the respective JsonConverter
			DataType = dataType;
		}

 

We can then derive classes from our abstract DataValidationRuleBase, which allows us to use any methods defined with the base class (such as the Equals method, which will come in handy when writing tests). The DataRuleBase class is the starting point for deriving classes that could provide data validation for a variety of platforms, such as MS SQL, Oracle, or in our case Hive.

 

The DataValidationRuleSetBase contains a list of IDataRule (not DataRuleBase, or anything more specific such as HiveDataTypeDataRule, which would be too tightly coupled to be of any use), thereby allowing us to use any class that implements the specified interface. We could develop any class that implements IDataRule and our DataValidationRuleSetBase (or any classes deriving from it or implementing IDataValidationRuleSet) will be able to work with it.

 

HiveDataValidationRuleSet

Our HiveDataValidationRuleSet derives from DataValidationRuleSetBase, and so the DataRules property can therefore contain any of our Hive data rule types, as they all derive from DataRuleBase, which implements IDataRule. Again, following standard Dependency Injection practice, we can pass in the referenced objects that implement the IDataRule interface via a constructor, thereby removing any explicit reference to implementations within our HiveDataValidationRuleSet class, as below.

 

[JsonConstructor]
		public HiveDataValidationRuleSet(string name, string sourceObjectName, List dataRules) : base(name, sourceObjectName, dataRules)
		{
		}

 

The constructor simply calls the base class constructor with the same constuctor arguments. So nothing tightly coupled here to worry about.

 

Well that’s all rather wonderful, albeit pretty standard OO goodies, but how do we go about deserializing a JSON document into the required classes for actually exercising the data validation rules?

Newtonsoft Json.NET Library

As with pretty much anything JSON-related in DotNet, the library to use is Newtonsoft Json.NET. This is a fantastic library for all things JSON. It has very good examples and API reference material, and appears to have covered pretty much every eventuality for coding against JSON in DotNet. From custom constructors to Dependency Injection and IoC container considerations this really is an amazing piece of work. You can find out more here.

Deserializing Into Virtual Objects?

As stated loud and clear in the Newtonsoft Json.NET documentation, it is simply not possible to deserialize a JSON document into a non-concrete (i.e. abstract or interface) target as these cannot in themselves be instantiated. If you try, you’ll get an error similar to the following:

“Could not create an instance of type JsonSerialization.IDataTypeDataRule. Type is an interface or abstract class and cannot be instantated”.

So, if we want to deserialize a JSON document that contains a HiveDataValidationRuleSet, by default it will try and parse the JSON into objects of type IDataRule. Fur balls all over the place. Not going to work. So what now? We must therefore use an approach to deserializing our classes that allows specifying the resultant target object type. Don’t worry, Newtonsoft have got this one covered (as with everything else).

CustomCreationConverter

As the name suggests this class allows creation of a class using some predefined conditional logic. We need to deserialize our JSON into a DataValidationRuleSet that contains a variety of different data rule types that will be executed against the source object. We can do this using the following class deriving from the CustomCreationConverter. Notice how it uses a generic type for the base class:

public class JsonDataRuleBaseConverter : CustomCreationConverter
{

So this will allow us to deal with the conversion from a DataRuleBase abstract class to a concrete class such as a DataTypeDataRule class. How so? Well we’ve made things a little easier for ourselves with that RuleType property we mentioned earlier.This will allow us to choose a class type to deserialize to, based on this value. DataRuleType is an enum of the various class types we will be allowing in our JSON. The CustomCreationConverter class we’re deriving our JsonDataRuleBaseConverter from has a Create method that we’ll override in order to give us the required data rule class.

		///

/// Creates the specified object subtype from the RuleType property. ///

///Type of the object. ///The jObject. /// public DataRuleBase Create(Type objectType, JObject jObject) { DataRuleType ruleType = jObject[“RuleType”].ToObject(); string name = (string)jObject.Property(“Name”); DataRuleThreshold threshold = jObject[“Threshold”].ToObject(); string targetField; switch (ruleType) { case DataRuleType.DataType: targetField = (string)jObject[“TargetField”]; DataType hiveDataType = jObject[“DataType”].ToObject(); HiveDataTypeDataRule dtdr = new HiveDataTypeDataRule(name, ruleType, threshold, targetField, hiveDataType); return dtdr; case DataRuleType.Distinct: targetField = (string)jObject[“TargetField”]; HiveDistinctDataRule ddr = new HiveDistinctDataRule(name, ruleType, threshold, targetField); return ddr; case DataRuleType.Format: targetField = (string)jObject[“TargetField”]; string formatPattern = (string)jObject[“FormatPattern”]; HiveFormatDataRule fdr = new HiveFormatDataRule(name, ruleType, threshold, targetField, formatPattern); return fdr; case DataRuleType.NullValue: targetField = (string)jObject[“TargetField”]; HiveNullValueDataRule nvdr = new HiveNullValueDataRule(name, ruleType, threshold, targetField); return nvdr; case DataRuleType.Range: targetField = (string)jObject[“TargetField”]; string rangeStart = (string)jObject[“RangeStart”]; string rangeEnd = (string)jObject[“RangeEnd”]; HiveRangeDataRule rdr = new HiveRangeDataRule(name, ruleType, threshold, targetField, rangeStart, rangeEnd); return rdr; default: return null; } }

The ReadJson method provided is then overridden with the following code, which simply passes our Json object to the Create method for the actual instantiation of the required derived Hive data validation rule class, as below:

		///

/// Reads the JSON representation of the object. ///

///The to read from. ///Type of the object. ///The existing value of object being read. ///The calling serializer. /// /// The object value. /// public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer) { if (reader.TokenType == JsonToken.StartObject) { // Load JObject from stream JObject jObject = JObject.Load(reader); // Create target object based on JObject var target = Create(objectType, jObject); return target; } else return null; }

There are other ways of indicating the type within the JSON being deserialized, such as with the “$type” JSON property and the TypeNameHandling setting described here, with an example here, which can be used to specify the class type to use directly, with a value that is the fully qualified class name,  but this can start to make the JSON rather cluttered and difficult to change should you decide to do things differently from within your class library.

We can then create a specific type of data rule by passing in our JSON document and using our JsonDataRuleBaseConverter as below.

IDataRule dataRule = JsonConvert.DeserializeObject(JsonIn, new JsonDataRuleBaseConverter());

Our underlying type will be that of the actual derived class created based on the RuleType property specified in the JSON document even though we have only coded against the IDataRule interface here and not needed to resort to specifying a concrete implementation class.

Up Next…

In the next post in the series I’ll explain how we wired all this up so that we can go about deserializing them using the JsonDataRuleBaseConverter that is derived from the Json.NET CustomCreationConverter. We’ll also see how we can use Dependency Injection and Inversion of Control (IoC) containers to create our objects. Okay that’s it for now. Tune in next time for another thrilling instalment of Class Hierarchies, SOLID Code and Json.NET Serialization.