Revisited – XML File as database

I’ve again relooked at the idea of using a plain XML file as a source of a ‘small’ database. The previous attempt resulted in me going down the path of using datasets (typed and non-typed).

This time I started with a set of very simple classes (.Net/C# classes) that when you serialize them to XML it looks very much like a small data store/structure I also happened to use for another import/export genealogical database app I created.

Effectively through the use of ‘Generics’ I am able to create a simple ‘base’ class with which any such small in-memory database can be created/used/persisted to and from disk again. At the ‘lowest’ level I have a class that only does 3 things: Create the data store, load from disk/file and then save to disk/file. It makes use of a small utility class that handles serialization to (xml) files.

public interface ISmallDataStore
{

void CreateNewDataStore();
void LoadDataStore(string source);
void SaveDataStore(string destination);

}

public abstract class SDSXMLFileBase<T> : ISmallDataStore where T: class
{

private string sourceFilePath;
internal T DataContainer;

#region ISDS Members
abstract public void CreateNewDataStore();
abstract internal void SetRelationObjects();
abstract internal void RefreshCacheObjects();

public void LoadDataStore(string source)
{

sourceFilePath = source;
DataContainer = SerializationUtils.DeserializeXMLFile<T>(source);
SetRelationObjects();
RefreshCacheObjects();

}
public void SaveDataStore(string destination)
{

sourceFilePath = destination;
SerializationUtils.SerializeXMLToFile<T>(destination, DataContainer);

}
#endregion

}

Two abstract methods are added to facilitate functionality that will be explained later (SetRelationObjects and RefreshCacheObjects). They are optional and but I needed them for specific reasons.

The next ‘layer’ is a class that implements ‘DataContainer’ with data structures you want to use as the data store. These data structures are what is going to be saved/serialized. The following is small example of what it looks:

public class SampleSDSImplementation : SDSXMLFileBase<SampleTDataContainer>
{

public override void CreateNewDataStore()
{

DataContainer = new SampleTDataContainer();

}

}

Through the use of serialization attributes you can define the way the resulted xml would look that is stored. In my example the class SampleTDataContainer is effectively the data store and its fields are decorated with the attributes like XmlElement, XmlAttribute etc.

The following is a small excerpt from a sample class:

[Serializable(), XmlType(“data”)]
public class SampleTDataContainer
{

[XmlElement(“p”)]
public List<Person> Persons = new List<Person>();
[XmlElement(“m”)]
public List<Marriage> Marriages = new List<Marriage>();

}

[Serializable(), XmlType(“p”)]
public class Person
{

[XmlAttribute(“id”)]
public int Id { get; set; }
[XmlAttribute(“fn”)]
public string FirstName { get; set; }
[XmlAttribute(“sn”)]
public string Surname { get; set; }

}

A basic output of the (xml) file will look something like this:

<?xml version=”1.0″ encoding=”utf-16″?>
<data>

<p id=”1″ fn=”Anakin” sn=”Skywalker” … />
<p id=”2″ fn=”Padmé” sn=”Amidala” … />

<m id=”1″ mno=”1″ hid=”1″ wid=”2″ … />

</data>

Now that is all good and well for very basic stuff but what about a few more advanced requirements, like auto generating the ‘id’s, having ‘references’ between the objects (the Marriage class will have two references to Person (husband and wife), Person will have two (father and mother) and even reverse referencing (Person having a list of Marriage objects..).

Setting up those references in code is easy but once it has been serialized and then deserialized you get all kind of funnies. For example, An instance of a Person class might have a reference to a marriage and that Marriage instance has a reference to the same Person instance (the first time when you set it up in the code). Next time when the data is loaded from file (deserialized) the Marriage instance reference will not pont to the same instance as the Person instance (and vice versa). Ok this is a generic explanation of the issue but hopefully you get the drift…  Well, this is where one of those 2 methods I mentioned in the original base class comes in – SetRelationObjects. To ensure all referencing objects (at run-time) are actually referencing the correct objects you can do something like this:

internal override void SetRelationObjects()
{

foreach (Marriage m in DataContainer.Marriages)
{

if (m.HusbandId > 0 && m.Husband == null)

m.Husband = DataContainer.Persons.FirstOrDefault(h => h.Id == m.HusbandId);

if (m.WifeId > 0 && m.Wife == null)

m.Wife = DataContainer.Persons.FirstOrDefault(w => w.Id == m.WifeId);

}
foreach (Person p in DataContainer.Persons)
{

if (p.FatherId > 0 && p.Father == null)

p.Father = DataContainer.Persons.FirstOrDefault(f => f.Id == p.FatherId);

if (p.MotherId > 0 && p.Mother == null)

p.Mother = DataContainer.Persons.FirstOrDefault(f => f.Id == p.MotherId);

//Reassign union objects since deserialization created new/separated instances.
for (int i = 0; i < p.Marriages.Count; i++)
{

Marriage m = p.Marriages[i];
p.Marriages[i] = DataContainer.Marriages.FirstOrDefault(u => u.Id == m.Id);

}

}

}

The fields for things like (Person) Father and (Person) Mother was not shown in the class listing but you can probably guess what they should look like. These fields are ‘NOT’ serialized per se but rather handled by adding a (int) FatherId and (int) MotherId set of fields that ‘ARE’ serialized. It both makes it easier to read and smaller to store in the xml file. When deserializing only the ‘id’ fields are restored letting the SetRelationObjects method correct the referencing just after load. There may be other ways to do these kind of things but I choose this one as it suits me at the moment.

When designing classes for serialization it helps to know a couple of things about Xml serialization attributes and hidden methods. Lets say you want to have a public field in your class (like the Father/Mother ones) that you do not want exposed by serialization you can use a public bool ShouldSerialize<FieldName>() method inside the class. e.g.

public bool ShouldSerializeFatherId()
{

return FatherId > 0;

}
public bool ShouldSerializeMotherId()
{

return MotherId > 0;

}

[XmlAttribute(“fid”)]
public int FatherId { get; set; }

[XmlAttribute(“mid”)]
public int MotherId { get; set; }

private Person father = null;
[XmlIgnore]
public Person Father
{

get { return father; }
set
{

if (value != null)
{

father = value;
FatherId = father.Id;

}
else

FatherId = 0;

}

}
private Person mother = null;
[XmlIgnore]
public Person Mother
{

get { return mother; }
set
{

if (value != null)
{

mother = value;
MotherId = mother.Id;

}
else

MotherId = 0;

}

}

Helper methods for interacting with the data

It is possible that you can use the data container as is and directly call methods on the Person objects but that means you could be duplicating a lot of code or functionality each time you use it. To help with this I added a few simple helper methods in the class that implements the base class (SampleSDSImplementation). For example:

public Person AddPerson(string firstName, string surname)
{

int nextPersonId = 1;
if (DataContainer.Persons.Count > 0)

nextPersonId = (from p in DataContainer.Persons select p.Id).Max() + 1;

Person newPerson = new Person() { FirstName = firstName, Surname = surname };
newPerson.Id = nextPersonId;
DataContainer.Persons.Add(newPerson);
return newPerson;

}

public Person FindSinglePersonById(int id)
{

return DataContainer.Persons.FirstOrDefault(p => p.Id == id);

}

As an example of how to use it look at the following:

string dataStoreLocation = System.IO.Path.Combine(System.Environment.GetFolderPath(Environment.SpecialFolder.Desktop), “TestSDS.xml”);
SampleSDSImplementation sdsSample = new SampleSDSImplementation();
sdsSample.CreateNewDataStore();

Person vader = sdsSample.AddPerson(“Anakin”, “Skywalker”);
vader.IsMale = true;
vader.History = “The baddy of the story”;
vader.NickName = “Darth Vader”;
vader.PlaceOfBirth = “Tatooine”;
vader.PlaceOfDeath = “Death star 2”;

Person padme = sdsSample.AddPerson(“Padmé”, “Amidala”);
padme.DateOfDeath.Year = 2005;
padme.PlaceOfDeath = “Polis Massa”;
sdsSample.AddMarriage(vader, padme, 1, null, “Naboo”, null, “Mustafar”);

Person luke = sdsSample.AddPerson(“Luke”, “Skywalker”);
luke.IsMale = true;
luke.ChildNo = 1;
luke.NickName = “Master Luke”;
luke.PlaceOfBirth = “Polis Massa”;
luke.Father = vader;
luke.Mother = padme;

sdsSample.SaveDataStore(dataStoreLocation);

Conclusion

This is a simple way to build a little database (don’t fool yourself that you can easily build huge databases with this hehe) with which you can run a simple system that requires small database that can fit into memory.

See example project here: SDSTest

 

Leave a Reply

%d bloggers like this: