Tag Archives: xml

Revisited – XML File as database

I’ve again relooked at the idea of using a plain XML file as a source of a ‘small’ database. The previous attempt resulted in me going down the path of using datasets (typed and non-typed).

This time I started with a set of very simple classes (.Net/C# classes) that when you serialize them to XML it looks very much like a small data store/structure I also happened to use for another import/export genealogical database app I created.

Effectively through the use of ‘Generics’ I am able to create a simple ‘base’ class with which any such small in-memory database can be created/used/persisted to and from disk again. At the ‘lowest’ level I have a class that only does 3 things: Create the data store, load from disk/file and then save to disk/file. It makes use of a small utility class that handles serialization to (xml) files.

public interface ISmallDataStore
{

void CreateNewDataStore();
void LoadDataStore(string source);
void SaveDataStore(string destination);

}

public abstract class SDSXMLFileBase<T> : ISmallDataStore where T: class
{

private string sourceFilePath;
internal T DataContainer;

#region ISDS Members
abstract public void CreateNewDataStore();
abstract internal void SetRelationObjects();
abstract internal void RefreshCacheObjects();

public void LoadDataStore(string source)
{

sourceFilePath = source;
DataContainer = SerializationUtils.DeserializeXMLFile<T>(source);
SetRelationObjects();
RefreshCacheObjects();

}
public void SaveDataStore(string destination)
{

sourceFilePath = destination;
SerializationUtils.SerializeXMLToFile<T>(destination, DataContainer);

}
#endregion

}

Two abstract methods are added to facilitate functionality that will be explained later (SetRelationObjects and RefreshCacheObjects). They are optional and but I needed them for specific reasons.

The next ‘layer’ is a class that implements ‘DataContainer’ with data structures you want to use as the data store. These data structures are what is going to be saved/serialized. The following is small example of what it looks:

public class SampleSDSImplementation : SDSXMLFileBase<SampleTDataContainer>
{

public override void CreateNewDataStore()
{

DataContainer = new SampleTDataContainer();

}

}

Through the use of serialization attributes you can define the way the resulted xml would look that is stored. In my example the class SampleTDataContainer is effectively the data store and its fields are decorated with the attributes like XmlElement, XmlAttribute etc.

The following is a small excerpt from a sample class:

[Serializable(), XmlType(“data”)]
public class SampleTDataContainer
{

[XmlElement(“p”)]
public List<Person> Persons = new List<Person>();
[XmlElement(“m”)]
public List<Marriage> Marriages = new List<Marriage>();

}

[Serializable(), XmlType(“p”)]
public class Person
{

[XmlAttribute(“id”)]
public int Id { get; set; }
[XmlAttribute(“fn”)]
public string FirstName { get; set; }
[XmlAttribute(“sn”)]
public string Surname { get; set; }

}

A basic output of the (xml) file will look something like this:

<?xml version=”1.0″ encoding=”utf-16″?>
<data>

<p id=”1″ fn=”Anakin” sn=”Skywalker” … />
<p id=”2″ fn=”Padmé” sn=”Amidala” … />

<m id=”1″ mno=”1″ hid=”1″ wid=”2″ … />

</data>

Now that is all good and well for very basic stuff but what about a few more advanced requirements, like auto generating the ‘id’s, having ‘references’ between the objects (the Marriage class will have two references to Person (husband and wife), Person will have two (father and mother) and even reverse referencing (Person having a list of Marriage objects..).

Setting up those references in code is easy but once it has been serialized and then deserialized you get all kind of funnies. For example, An instance of a Person class might have a reference to a marriage and that Marriage instance has a reference to the same Person instance (the first time when you set it up in the code). Next time when the data is loaded from file (deserialized) the Marriage instance reference will not pont to the same instance as the Person instance (and vice versa). Ok this is a generic explanation of the issue but hopefully you get the drift…  Well, this is where one of those 2 methods I mentioned in the original base class comes in – SetRelationObjects. To ensure all referencing objects (at run-time) are actually referencing the correct objects you can do something like this:

internal override void SetRelationObjects()
{

foreach (Marriage m in DataContainer.Marriages)
{

if (m.HusbandId > 0 && m.Husband == null)

m.Husband = DataContainer.Persons.FirstOrDefault(h => h.Id == m.HusbandId);

if (m.WifeId > 0 && m.Wife == null)

m.Wife = DataContainer.Persons.FirstOrDefault(w => w.Id == m.WifeId);

}
foreach (Person p in DataContainer.Persons)
{

if (p.FatherId > 0 && p.Father == null)

p.Father = DataContainer.Persons.FirstOrDefault(f => f.Id == p.FatherId);

if (p.MotherId > 0 && p.Mother == null)

p.Mother = DataContainer.Persons.FirstOrDefault(f => f.Id == p.MotherId);

//Reassign union objects since deserialization created new/separated instances.
for (int i = 0; i < p.Marriages.Count; i++)
{

Marriage m = p.Marriages[i];
p.Marriages[i] = DataContainer.Marriages.FirstOrDefault(u => u.Id == m.Id);

}

}

}

The fields for things like (Person) Father and (Person) Mother was not shown in the class listing but you can probably guess what they should look like. These fields are ‘NOT’ serialized per se but rather handled by adding a (int) FatherId and (int) MotherId set of fields that ‘ARE’ serialized. It both makes it easier to read and smaller to store in the xml file. When deserializing only the ‘id’ fields are restored letting the SetRelationObjects method correct the referencing just after load. There may be other ways to do these kind of things but I choose this one as it suits me at the moment.

When designing classes for serialization it helps to know a couple of things about Xml serialization attributes and hidden methods. Lets say you want to have a public field in your class (like the Father/Mother ones) that you do not want exposed by serialization you can use a public bool ShouldSerialize<FieldName>() method inside the class. e.g.

public bool ShouldSerializeFatherId()
{

return FatherId > 0;

}
public bool ShouldSerializeMotherId()
{

return MotherId > 0;

}

[XmlAttribute(“fid”)]
public int FatherId { get; set; }

[XmlAttribute(“mid”)]
public int MotherId { get; set; }

private Person father = null;
[XmlIgnore]
public Person Father
{

get { return father; }
set
{

if (value != null)
{

father = value;
FatherId = father.Id;

}
else

FatherId = 0;

}

}
private Person mother = null;
[XmlIgnore]
public Person Mother
{

get { return mother; }
set
{

if (value != null)
{

mother = value;
MotherId = mother.Id;

}
else

MotherId = 0;

}

}

Helper methods for interacting with the data

It is possible that you can use the data container as is and directly call methods on the Person objects but that means you could be duplicating a lot of code or functionality each time you use it. To help with this I added a few simple helper methods in the class that implements the base class (SampleSDSImplementation). For example:

public Person AddPerson(string firstName, string surname)
{

int nextPersonId = 1;
if (DataContainer.Persons.Count > 0)

nextPersonId = (from p in DataContainer.Persons select p.Id).Max() + 1;

Person newPerson = new Person() { FirstName = firstName, Surname = surname };
newPerson.Id = nextPersonId;
DataContainer.Persons.Add(newPerson);
return newPerson;

}

public Person FindSinglePersonById(int id)
{

return DataContainer.Persons.FirstOrDefault(p => p.Id == id);

}

As an example of how to use it look at the following:

string dataStoreLocation = System.IO.Path.Combine(System.Environment.GetFolderPath(Environment.SpecialFolder.Desktop), “TestSDS.xml”);
SampleSDSImplementation sdsSample = new SampleSDSImplementation();
sdsSample.CreateNewDataStore();

Person vader = sdsSample.AddPerson(“Anakin”, “Skywalker”);
vader.IsMale = true;
vader.History = “The baddy of the story”;
vader.NickName = “Darth Vader”;
vader.PlaceOfBirth = “Tatooine”;
vader.PlaceOfDeath = “Death star 2”;

Person padme = sdsSample.AddPerson(“Padmé”, “Amidala”);
padme.DateOfDeath.Year = 2005;
padme.PlaceOfDeath = “Polis Massa”;
sdsSample.AddMarriage(vader, padme, 1, null, “Naboo”, null, “Mustafar”);

Person luke = sdsSample.AddPerson(“Luke”, “Skywalker”);
luke.IsMale = true;
luke.ChildNo = 1;
luke.NickName = “Master Luke”;
luke.PlaceOfBirth = “Polis Massa”;
luke.Father = vader;
luke.Mother = padme;

sdsSample.SaveDataStore(dataStoreLocation);

Conclusion

This is a simple way to build a little database (don’t fool yourself that you can easily build huge databases with this hehe) with which you can run a simple system that requires small database that can fit into memory.

See example project here: SDSTest

 

XmlElementEx – a few XmlElement extension methods

I thought I’d just share a few extension methods I created for myself to handle a hew things with the XmlElement, XmlNode and XmlDocument classes of System.Xml namespace.

These include:

  • ReadXmlElementAttr
  • CreateAttributeWithValue
  • SetAttributeValue
  • AppendElementWithText
  • AppendElementWithCDATA
using System.Xml;

    public static class XmlElementEx
    {
        public static string ReadXmlElementAttr(this XmlElement e, string attrName, string defaultValue = "")
        {
            try
            {
                if (e.HasAttribute(attrName))
                    return e.Attributes.GetNamedItem(attrName).Value;
            }
            catch { }
            return defaultValue;
        }
        public static string ReadXmlElementAttr(this XmlNode e, string attrName, string defaultValue = "")
        {
            return ReadXmlElementAttr((XmlElement)e, attrName, defaultValue);
        }
        public static XmlAttribute CreateAttributeWithValue(this XmlDocument xml, string attrName, bool value)
        {
            return xml.CreateAttributeWithValue(attrName, value.ToString());
        }
        public static XmlAttribute CreateAttributeWithValue(this XmlDocument xml, string attrName, int value)
        {
            return xml.CreateAttributeWithValue(attrName, value.ToString());
        }
        public static XmlAttribute CreateAttributeWithValue(this XmlDocument xml, string attrName, long value)
        {
            return xml.CreateAttributeWithValue(attrName, value.ToString());
        }

        public static XmlAttribute CreateAttributeWithValue(this XmlDocument xml, string attrName, string value)
        {
            XmlAttribute attr;
            attr = xml.CreateAttribute(attrName);
            attr.Value = value;
            return attr;
        }
        public static XmlAttribute SetAttributeValue(this XmlNode e, string attrName, string value)
        {
            return SetAttributeValue((XmlElement)e, attrName, value);
        }
        public static XmlAttribute SetAttributeValue(this XmlElement e, string attrName, string value)
        {
            if (e.HasAttribute(attrName))
                e.Attributes[attrName].Value = value;
            else
                e.Attributes.Append(CreateAttributeWithValue(e.OwnerDocument, attrName, value));
            return e.Attributes[attrName];
        }

        public static XmlElement AppendElementWithText(this XmlElement e, string elementName, string text)
        {
            XmlElement newElement = e.OwnerDocument.CreateElement(elementName);
            newElement.InnerText = text;
            e.AppendChild(newElement);
            return newElement;
        }
        public static XmlElement AppendElementWithCDATA(this XmlElement e, string elementName, string cdataText)
        {
            XmlElement newElement = e.OwnerDocument.CreateElement(elementName);
            XmlCDataSection commentNode = e.OwnerDocument.CreateCDataSection(cdataText);
            newElement.AppendChild(commentNode);
            e.AppendChild(newElement);
            return newElement;
        }
    }

 

EventScavenger 4.2

I recently had the ‘pleasure’ of having my EventScavenger tool being used to track the security event logs of a couple of domain controllers for a ‘full’ enterprise company. It may be temporary for us to help a specific issue but in the process I had to re-look at how the collector service (the heart of EventScavenger) performs under really high stress. In case you don’t fully appreciate it, with full auditing for successful and failed events these event logs full up within an hour or so – and they have been set to something like 100MB+! Events literally flood into the logs at (sometimes) 10000 plus events in a single minute.

Now, I’ve never had any performance problems with the collectors until now. I’ve had a single collector running on a plain old pc gathering around 50 logs or more running 24×7 for 4-5 years. In all those cases each individual event log never have more than 20MB of data in them and almost never full of data that gets overwritten within an hour or so. The problem with the domain controller event logs are the sheer volume of events written ‘all the time’ plus the collector is running on a separate pc with the sql database on a remote system that is shared with dozen of other systems. The end result is that events retrieved from the logs cannot (1) be read quickly enough and (2) written quickly enough to the database. The first problem is not something I can do anything about now – other than having a collector installed on each of the domain controllers. This is not feasible as it won’t be allowed and they are Windows Core machines anyway.

To address the second issue I did some research into how to submit batches of records to the database since doing it one by one just isn’t fast enough. I know about the new table variable in newer versions of sql server but unfortunately the server that was provided is only sql 2005 which does not support that. Fortunately it does support the xml data type even as a parameter. I found an article ‘Sending Multiple Rows to Database for Modification‘ on codeproject that specifically deals with sql 2005 that describes 2 ways to accomplish this: using a delimited string or the xml data type. The delimited option won’t work for me as it is limited to about 4000 characters and the amount of data I deal with is orders of magnitude more than that. The xml data type allows for up to 2GB in a single shot!

I did quite a few performance tests and there is a real increase in number of records that can be processed using batches – despite the possible overhead of first converting the rows to xml and then sql server converting it back to a table (in memory). Interestingly, increasing the batch size beyond a certain point (1000 as I tested) does not increase the overall throughput. It more or less stay linear. The problem with that is you increase the chance that a single failure caused by one row can make the whole batch fail. So the best is to have a batch size that is big enough to warrant the benefit but not too big so it can cause failures.

Example code

A basic example of the sql xml handling:

Declare @xml  XML
SET @xml = N’
<rs>
<r cn=”machineName” l=”logName” etc />

</rs>’
SELECT
T.Row.value(‘@cn’, ‘VARCHAR(255)’) MachineName,
T.Row.value(‘@l’, ‘varchar(255)’) LogName
FROM   @xml.nodes(‘/rs/r’) AS T(Row)

To generate the xml on client side (simplified code):

System.IO.MemoryStream outstream = new System.IO.MemoryStream();
using (System.Xml.XmlTextWriter wrt = new System.Xml.XmlTextWriter(outstream, Encoding.UTF8))
{

wrt.Formatting = System.Xml.Formatting.Indented;
wrt.Indentation = 1;
wrt.IndentChar = ‘ ‘;
wrt.WriteStartDocument();
wrt.WriteStartElement(“rs”);

foreach (EventLogEntry ele in eventEntries)
{

wrt.WriteStartElement(“r”);
wrt.WriteAttributeString(“cn”, machine);
wrt.WriteAttributeString(“l”, logName);


wrt.WriteEndElement(); //r

}

wrt.WriteEndElement(); //rs
wrt.WriteEndDocument();
wrt.Flush();

outstream.Position = 0;
InsertEventLogEntries(outstream);

}

public InsertEventLogEntries(System.IO.Stream xml)
{

string sql = “InsertEventLogEntries”;
System.Data.SqlTypes.SqlXml sxl = new System.Data.SqlTypes.SqlXml(xml);
SqlParameter[] parms = new SqlParameter[] {
new SqlParameter(“@xml”, SqlDbType.Xml) { Value = sxl}
};
using (SqlCommand cmnd = new SqlCommand(sql, insertConn))
{
cmnd.CommandType = CommandType.StoredProcedure;
cmnd.CommandTimeout = CommandTimeout;
cmnd.Parameters.AddRange(parms);
try
{

cmnd.ExecuteNoQuery();

}
catch { throw; }

}

So with this change it is possible to handle larger sets of inserts into the database. Now only if there was a way to make reading an event log faster…

Multilingual applications

Living in the good old S of A it is natural for us to be bilingual or even multilingual. So what do we do if we have to create software that can/must be used by people of multiple language backgrounds? Traditionally Microsoft had the concept of resource files that you can use to create multiple versions of the same application for different languages or locales. Problem with this is once the executable has been compiled it is also fixed. What happens if you want the application to be multilingual at ‘run-time’?

Well, I’ve had to do something like this for myself. I first looked at resource files but it just did not solve the whole issue on its own. So I started creating something for myself that works for my application. Part of the solution still use .Net resource files for storing xml data that gets use in the language look up procedure. The idea is to load the text of controls and windows at the time the window open since control and window text get loaded at run-time anyway. The following is an explanation of the classes I created to facilitate my solution. It is a fully working solution and already in use inside a proper application. There may be other and even better or smarter solutions ‘out there’ but this one works for me.

LanguageHandler

This is the central class that handles all the language setting/reading etc. functionality. It encapsulate the list of ‘phrases’ used in the application and provides methods to retrieve words or phrases depending on a look-up key. Simply translating words alone is not good enough so the focus is on whole ‘phrases’.

The following is a partial view of the class without methods:

public static class LanguageHandler
{

private static List<Phrase> phraseList = new List<Phrase>();

#region Properties
private static string languageID = “en-za”;
public static Language AppLanguage {

get {

switch (languageID) {
case “af”:
return Language.Afrikaans;
case “en-za”:
return Language.English;
default:
return Language.Afrikaans; }

}
set {

switch (value) {
case Language.Afrikaans:
languageID = “af”;
break;
case Language.English:
languageID = “en-za”;
break;
default:
languageID = “en-za”;
break; }

}

public static string PhraseSource { set; get; }
#endregion

}

PhraseList.xml

The definition of the phrases can probably be stored in any data form but I choose a plain simple xml file to host the raw phrase data. This make it easy to maintain and port. It is also included into the application as a file resource although theoretically it could be located anywhere as long as the application can reach it.

The structure is also very basic as it is simply a serialized version of List<Phrase>

<?xml version=”1.0″ encoding=”utf-16″?>
<ArrayOfPhrase xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xmlns:xsd=”http://www.w3.org/2001/XMLSchema”>
<phrase id=”someId” default=”Some default”>

<phraseEntry lang=”af”>Afrikaanse waarde</phraseEntry>
<phraseEntry lang=”en-za”>English value</phraseEntry>

</phrase>
<phrase id=”someId2″ default=”Some default 2″>

</ArrayOfPhrase>

LoadPhrases method

Loading the data is as simple as deserializing the data. I have a helper class (not shown) to simply take the xml file and deserialize it:

#region LoadPhrases
public static void LoadPhrases()
{

if ((PhraseSource != null) && (PhraseSource.Length > 0))

phraseList = SerializationUtils.DeserializeXML<List<Phrase>>(PhraseSource);

}
#endregion

GetLanguagePhrase method

The main (and only relevant) method for retrieving stuff from the class is GetLanguagePhrase. It simply takes one parameter – the id value of the phrase required.

#region GetLanguagePhrase
public static string GetLanguagePhrase(string languagePhraseID)
{

if (phraseList.Count == 0)

LoadPhrases();

Phrase phrase = (from p in phraseList where p.LanguagePhraseID == languagePhraseID select p).FirstOrDefault();
if (phrase != null)
{

PhraseEntry phraseEntry = (from pe in phrase.Phrases where pe.LangId == languageID select pe).FirstOrDefault();
if (phraseEntry != null)

return phraseEntry.Text;

else

return phrase.DefaultValue;

}
throw new Exception(“Undefined language phrase ID”);

}
#endregion

SetControlTextAndTip methods

To make it a bit easier to set the text (and tooltips if available) of controls (my implementation use WinForms) I created a few helper methods to automatically set the text on the control based on a phrase id specified.

#region SetControlText
public static void SetControlTextAndTip(string languagePhraseID, params ToolStripItem[] tsis)
{

string text = GetLanguagePhrase(languagePhraseID);
foreach (ToolStripItem tsi in tsis)
{

tsi.Text = text;
tsi.ToolTipText = text;

}

}
public static void SetControlTextAndTip(string languagePhraseID, params Control[] crtls)
{

string text = GetLanguagePhrase(languagePhraseID);
foreach (Control crtl in crtls)
{

crtl.Text = text;

}

}
public static void SetControlTextAndTip(string languagePhraseID, params ColumnHeader[] columnHeaders)
{

string text = GetLanguagePhrase(languagePhraseID);
foreach (ColumnHeader ch in columnHeaders)
{

ch.Text = text;

}

}
#endregion

The Phrase and PhraseEntry classes

These two classes are really simple. For the purpose of this article they could be a lot simpler but since I built a separate ‘editor’ they have a bunch of attributes to make editing easier. This editor plus a few method inside LanguageHandler will have to wait for another article.

[Serializable, XmlType(“phrase”)]
public class Phrase : IComparable
{

[XmlAttribute(“id”),
Browsable(true),
CategoryAttribute(“Identifier”),
DefaultValueAttribute(“”),
DescriptionAttribute(“Phrase ID”),
ReadOnly(true)]
public string LanguagePhraseID { get; set; }
[XmlElement(“phraseEntry”),
Browsable(true),
CategoryAttribute(“Value details”),
DefaultValueAttribute(“”),
DescriptionAttribute(“Phrase entries”)]
public List<PhraseEntry> Phrases { get; set; }
[XmlAttribute(“default”),
Browsable(true),
CategoryAttribute(“Value details”),
DefaultValueAttribute(“”),
Editor(typeof(MultilineStringEditor), typeof(UITypeEditor)),
DescriptionAttribute(“Default value. Use ‘\\r\\n’ to indicate crlf.”)]
public string DefaultValue { get; set; }

#region IComparable Members
public int CompareTo(object obj)
{

Phrase otherPhrase = (Phrase)obj;
return LanguagePhraseID.CompareTo(otherPhrase.LanguagePhraseID);

}
#endregion

}

[Serializable, XmlType(“entry”), TypeConverter(typeof(PhraseEntryConverter))]
public class PhraseEntry
{

public PhraseEntry() { }
public PhraseEntry(string phraseEntryDef)
{

if (phraseEntryDef.Contains(“:”))
{

LangId = phraseEntryDef.Substring(0, phraseEntryDef.IndexOf(“:”));
Text = phraseEntryDef.Substring(phraseEntryDef.IndexOf(“:”) + 1);

}

}

[XmlAttribute(“lang”),
Browsable(true),
CategoryAttribute(“Phrase entry”),
DefaultValueAttribute(“”),
DescriptionAttribute(“Language”)]
public string LangId { get; set; }
[XmlText(),
Browsable(true),
CategoryAttribute(“Phrase entry”),
DefaultValueAttribute(“”),
Editor(typeof(MultilineStringEditor), typeof(UITypeEditor)),
DescriptionAttribute(“Text”)]
public string Text { get; set; }

public override string ToString()
{

return LangId + “:” + Text;

}

}

Using it all

Using LanguageHandler is really easy. Typically in a form’s onload event I call a private local method that sets all the texts of the controls that need to be set. There is no real performance issue to worry about since all the phrase data is stored in memory. One disadvantage is that normally the form need to be closed and reopened if the ‘language’ has change. This is not really a problem since changing the language is something that should not happen often. Technically this private LoadLanguageResources method could be ran any number of times whenever you want. The only place where I usually call it multiple times is on the ‘Options’ dialog where you choose the language being used.

If you have to use the same phrase multiple times you can always store the value in a local string variable.

The following is a short example of how to use it:

static class Program
{

/// <summary>
/// The main entry point for the application.
/// </summary>
[STAThread]
static void Main()
{


if (Properties.Settings.Default.LanguageID == “English”)

LanguageHandler.AppLanguage = Language.English;

else

LanguageHandler.AppLanguage = Language.Afrikaans;

LanguageHandler.PhraseSource = Properties.Resources.PhraseList;
LanguageHandler.LoadPhrases();

Form mainForm = new MainForm();

Application.Run(mainForm);

}

public partial class MainForm : Form
{

….

private void LoadLanguageResources()
{

Text = LanguageHandler.GetLanguagePhrase(“AppName”);
LanguageHandler.SetControlTextAndTip(“File”, fileToolStripMenuItem);
LanguageHandler.SetControlTextAndTip(“NewPerson”, newPersoonToolStripMenuItem);

}

private void MainForm_Load(object sender, EventArgs e)
{

LoadLanguageResources();

}

}

Summary

As you can see the solution is not really complicated. I really like simple solutions that ‘just works’. This one has proven to be working for my needs so far. I have over 340 different phrases, some are very long themselves like message box prompts.

Xml file as a database – part 4

Today’s part will be a slight deviation from what I planned last time. I got side-tracked with trying to save the xml file in a compressed format and actually managed to achieve it.

DotNetZip library

Thanks to this open source library it is possible to output the xml generated by the DataSet straight to a zip file – it is not necessary to first save the xml file and then zip it. To achieve this all you need is the following:

private void WriteToZipFile(string zipFilePath, DataSet ds)
{

MemoryStream mStream = new MemoryStream();
ds.WriteXml(mStream, XmlWriteMode.WriteSchema);
mStream.Seek(0, SeekOrigin.Begin);

if (File.Exists(zipFilePath))

File.Delete(zipFilePath);

using (ZipFile zipFile = new ZipFile(zipFilePath))
{

zipFile.CompressionLevel = Ionic.Zlib.CompressionLevel.BestCompression;
zipFile.AddEntry(ds.DataSetName + “.xml”, mStream);
zipFile.Save();

}

}

private Stream ReadFromZipFile(string zipFilePath, string dsName)
{

MemoryStream mStream = new MemoryStream();
using (ZipFile zipFile = new ZipFile(zipFilePath))
{

if ((from z in zipFile.Entries
where z.FileName == dsName + “.xml”
select z).Count() > 0)
{

ZipEntry ze = (from z in zipFile.Entries
where z.FileName == dsName + “.xml”
select z).First();

ze.Extract(mStream);
mStream.Seek(0, SeekOrigin.Begin);

}

}

return mStream;

}

Using it is as simple as this:

DatabaseTest dbt = new DatabaseTest(); // Typed DataSet

//Saving
WriteToZipFile(“SomeFileName.zip”, dbt);

//Loading
DatabaseTest dbt2 = new DatabaseTest();
dbt2.ReadXml(ReadFromZipFile(“SomeFileName.zip”, dbt.DataSetName), XmlReadMode.IgnoreSchema); //Important to ‘exclude’ the schema for a typed DataSet

Really simple hey? Since xml is very ‘compressionable’ (like most text files) you can save quite a bit on disk space.

Find a copy of it here.

WriteToZipFile(saveFileDialogXml.FileName + ".zip"dbt);

Xml file as a database – part 3

Last time I stopped with the sample code how to produce a DataSet in code and then save and load it from a file. This time I’ll just continue with the ‘typed’ DataSet version plus show what the output would look like. Interestingly, the output of both the untyped and typed versions looks (essentially) the same.

First I’ll show how the Typed DataSet version works. Just to refresh the old minds – to create a Typed DataSet you use Visual Studio’s new template for a DataSet xsd and then use the designer to add tables, columns and relationships. Then in code you can do the following (the DataSet is named ‘DatabaseTest’):

DatabaseTest dbt = new DatabaseTest();
DatabaseTest.SurnameRow surname = (DatabaseTest.SurnameRow)dbt.Surname.NewRow();
surname.Surname = “Henning”;
surname.TS = DateTime.Now;
dbt.Surname.AddSurnameRow(surname);
surname = (DatabaseTest.SurnameRow)dbt.Surname.NewRow();
surname.Surname = “Van Der Merwe”;
surname.TS = DateTime.Now;
dbt.Surname.AddSurnameRow(surname);

DatabaseTest.PersonRow pa = (DatabaseTest.PersonRow)dbt.Person.NewRow();
pa.FirstName = “Daddy Koos”;
pa.SurnameId = 1;
pa.TS = DateTime.Now;
pa.YearOfBirth = 1945;
dbt.Person.AddPersonRow(pa);

DatabaseTest.PersonRow ma = (DatabaseTest.PersonRow)dbt.Person.NewRow();
ma.FirstName = “Mommy Johanna”;
ma.SurnameId = 2;
ma.TS = DateTime.Now;
ma.YearOfBirth = 1950;
dbt.Person.AddPersonRow(ma);

DatabaseTest.PersonRow guy = (DatabaseTest.PersonRow)dbt.Person.NewRow();
guy.FirstName = “Little Koos”;
guy.SurnameId = 1;
guy.TS = DateTime.Now;
guy.YearOfBirth = 1971;
guy.FatherID = 1;
guy.MotherID = 2;
guy.ChildNo = 1;
dbt.Person.AddPersonRow(guy);

if (saveFileDialogXml.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{

if (System.IO.File.Exists(saveFileDialogXml.FileName))

System.IO.File.Delete(saveFileDialogXml.FileName);

dbt.WriteXml(saveFileDialogXml.FileName, XmlWriteMode.WriteSchema);

var people = (from p in dbt.Person
join s in dbt.Surname on p.SurnameId equals s.Id
select new
{

PersonId = p.Id,
FirstName = p.FirstName,
Surname = s.Surname

});
StringBuilder sb = new StringBuilder();
foreach (var p in people)
{

sb.AppendLine(string.Format(“{0} {1}”, p.FirstName, p.Surname));

}
MessageBox.Show(sb.ToString());

System.Diagnostics.Process pro = new System.Diagnostics.Process();
pro.StartInfo = new System.Diagnostics.ProcessStartInfo(System.IO.Path.Combine(System.Environment.GetFolderPath(Environment.SpecialFolder.System), “notepad.exe”));
pro.StartInfo.Arguments = “\”” + saveFileDialogXml.FileName + “\””;
pro.Start();

}

This provides the same functionality as before. The output looks something like this (sorry, I’m a bit lazy to format/indend the output too much now):

<?xml version=”1.0″ standalone=”yes”?>
<DatabaseTest xmlns=”http://tempuri.org/DatabaseTest.xsd”>
<xs:schema id=”DatabaseTest” targetNamespace=”http://tempuri.org/DatabaseTest.xsd” xmlns:mstns=”http://tempuri.org/DatabaseTest.xsd” xmlns=”http://tempuri.org/DatabaseTest.xsd” xmlns:xs=”http://www.w3.org/2001/XMLSchema” xmlns:msdata=”urn:schemas-microsoft-com:xml-msdata” attributeFormDefault=”qualified” elementFormDefault=”qualified”>
<xs:element name=”DatabaseTest” msdata:IsDataSet=”true” msdata:UseCurrentLocale=”true”>
<xs:complexType>
<xs:choice minOccurs=”0″ maxOccurs=”unbounded”>
<xs:element name=”Person“>
<xs:complexType>
<xs:sequence>
<xs:element name=”Id” msdata:AutoIncrement=”true” msdata:AutoIncrementSeed=”1″ type=”xs:int” />
<xs:element name=”FirstName” type=”xs:string” />
<xs:element name=”SurnameId” type=”xs:int” />
<xs:element name=”Description” type=”xs:string” minOccurs=”0″ />
<xs:element name=”YearOfBirth” type=”xs:int” minOccurs=”0″ />
<xs:element name=”MonthOfBirth” type=”xs:unsignedByte” minOccurs=”0″ />
<xs:element name=”DayOfBirth” type=”xs:unsignedByte” minOccurs=”0″ />
<xs:element name=”TS” type=”xs:dateTime” />
<xs:element name=”FatherID” type=”xs:int” minOccurs=”0″ />
<xs:element name=”MotherID” type=”xs:int” minOccurs=”0″ />
<xs:element name=”PlaceOfBirthId” type=”xs:int” minOccurs=”0″ />
<xs:element name=”YearOfDeath” type=”xs:int” minOccurs=”0″ />
<xs:element name=”MonthOfDeath” type=”xs:unsignedByte” minOccurs=”0″ />
<xs:element name=”DayOfDeath” type=”xs:unsignedByte” minOccurs=”0″ />
<xs:element name=”PlaceOfDeathId” type=”xs:int” minOccurs=”0″ />
<xs:element name=”GenealogicalNo” type=”xs:string” minOccurs=”0″ />
<xs:element name=”ChildNo” type=”xs:unsignedByte” default=”0″ minOccurs=”0″ />
<xs:element name=”ProgenitorCode” type=”xs:string” minOccurs=”0″ />
<xs:element name=”IsProgenitor” type=”xs:boolean” minOccurs=”0″ />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name=”Surname“>
<xs:complexType>
<xs:sequence>
<xs:element name=”Id” msdata:AutoIncrement=”true” msdata:AutoIncrementSeed=”1″ type=”xs:int” />
<xs:element name=”Surname” type=”xs:string” />
<xs:element name=”TS” type=”xs:dateTime” />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name=”Place“>
<xs:complexType>
<xs:sequence>
<xs:element name=”Id” msdata:AutoIncrement=”true” msdata:AutoIncrementSeed=”1″ type=”xs:int” />
<xs:element name=”Place” type=”xs:string” />
<xs:element name=”TS” type=”xs:dateTime” />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name=”Marriage“>
<xs:complexType>
<xs:sequence>
<xs:element name=”Id” msdata:AutoIncrement=”true” msdata:AutoIncrementSeed=”1″ type=”xs:int” />
<xs:element name=”No” type=”xs:unsignedByte” default=”0″ minOccurs=”0″ />
<xs:element name=”HusbandId” type=”xs:int” />
<xs:element name=”WifeId” type=”xs:int” />
<xs:element name=”YearOfMarriage” type=”xs:int” minOccurs=”0″ />
<xs:element name=”MonthOfMarriage” type=”xs:unsignedByte” minOccurs=”0″ />
<xs:element name=”DayOfMarriage” type=”xs:unsignedByte” minOccurs=”0″ />
<xs:element name=”PlaceOfMarriageId” type=”xs:int” minOccurs=”0″ />
<xs:element name=”YearOfDivorce” type=”xs:int” minOccurs=”0″ />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
<xs:unique name=”Constraint1″ msdata:PrimaryKey=”true”>
<xs:selector xpath=”.//mstns:Person” />
<xs:field xpath=”mstns:Id” />
</xs:unique>
<xs:unique name=”Surname_Constraint1″ msdata:ConstraintName=”Constraint1″ msdata:PrimaryKey=”true”>
<xs:selector xpath=”.//mstns:Surname” />
<xs:field xpath=”mstns:Id” />
</xs:unique>
<xs:unique name=”Constraint2″>
<xs:selector xpath=”.//mstns:Surname” />
<xs:field xpath=”mstns:Surname” />
</xs:unique>
<xs:unique name=”Place_Constraint1″ msdata:ConstraintName=”Constraint1″ msdata:PrimaryKey=”true”>
<xs:selector xpath=”.//mstns:Place” />
<xs:field xpath=”mstns:Id” />
</xs:unique>
<xs:unique name=”Place_Constraint2″ msdata:ConstraintName=”Constraint2″>
<xs:selector xpath=”.//mstns:Place” />
<xs:field xpath=”mstns:Place” />
</xs:unique>
<xs:unique name=”Marriage_Constraint1″ msdata:ConstraintName=”Constraint1″ msdata:PrimaryKey=”true”>
<xs:selector xpath=”.//mstns:Marriage” />
<xs:field xpath=”mstns:Id” />
</xs:unique>
<xs:keyref name=”FK_Person_Marriage” refer=”Constraint1″>
<xs:selector xpath=”.//mstns:Marriage” />
<xs:field xpath=”mstns:HusbandId” />
</xs:keyref>
<xs:keyref name=”FK_Person_Marriage1″ refer=”Constraint1″>
<xs:selector xpath=”.//mstns:Marriage” />
<xs:field xpath=”mstns:WifeId” />
</xs:keyref>
<xs:keyref name=”FK_Person_Person1″ refer=”Constraint1″>
<xs:selector xpath=”.//mstns:Person” />
<xs:field xpath=”mstns:MotherID” />
</xs:keyref>
<xs:keyref name=”FK_Person_Surname” refer=”Surname_Constraint1″>
<xs:selector xpath=”.//mstns:Person” />
<xs:field xpath=”mstns:SurnameId” />
</xs:keyref>
<xs:keyref name=”FK_Place_Person” refer=”Place_Constraint1″>
<xs:selector xpath=”.//mstns:Person” />
<xs:field xpath=”mstns:PlaceOfBirthId” />
</xs:keyref>
<xs:keyref name=”FK_Person_Person” refer=”Constraint1″>
<xs:selector xpath=”.//mstns:Person” />
<xs:field xpath=”mstns:FatherID” />
</xs:keyref>
</xs:element>
</xs:schema>

<Person>
<Id>1</Id>
<FirstName>Daddy Koos</FirstName>
<SurnameId>1</SurnameId>
<YearOfBirth>1945</YearOfBirth>
<TS>2011-06-06T09:15:58.0021555+02:00</TS>
<ChildNo>0</ChildNo>
</Person>
<Person>
<Id>2</Id>
<FirstName>Mamma Johanna</FirstName>
<SurnameId>2</SurnameId>
<YearOfBirth>1950</YearOfBirth>
<TS>2011-06-06T09:52:51.4151521+02:00</TS>
<ChildNo>0</ChildNo>
</Person>
<Person>
<Id>3</Id>
<FirstName>Little Koos</FirstName>
<SurnameId>1</SurnameId>
<YearOfBirth>1971</YearOfBirth>
<TS>2011-06-06T09:52:51.4151521+02:00</TS>
<FatherID>1</FatherID>
<MotherID>2</MotherID>
<ChildNo>1</ChildNo>
</Person>

<Surname>
<Id>1</Id>
<Surname>Henning</Surname>
<TS>2011-06-06T09:15:57.9771555+02:00</TS>
</Surname>
<Surname>
<Id>2</Id>
<Surname>Van Der Merwe</Surname>
<TS>2011-06-06T09:52:51.4121521+02:00</TS>
</Surname>

</DatabaseTest>

As you can see the xsd is actually embedded inside the xml file. This is helpful to see the structure of the data.

Next step

The next step is to create a Data Access Layer because dealing with a DataSet directly in code is so… old fashioned… and so on. Stay tuned for the next iteration.

Xml file as a database part 2

Been a but busy lately but I didn’t give up on the idea of an xml database yet. Actually, I started with some code even before the first post – that is in principle working but it still needs a lot of work. Unfortunately I’ve been side-tracked on other things in the last couple of days so the whole idea is just sitting idle for the moment.

First let me cover some (old) basics. The following examples are nothing new and functionality available for quite a number of years using plain old DataSets.

The first concept is how to create define or create a DataSet in code or run-time. Look at the following example:

DataSet myDb = new DataSet(“XmlDb”);
DataTable personTable = new DataTable(“Person”);

DataColumn personId = new DataColumn(“Id”, typeof(int)) { AutoIncrement = true, AutoIncrementSeed = 1, Unique = true, AllowDBNull = false };
DataColumn surnameId = new DataColumn(“Id”, typeof(int)) { AutoIncrement = true, AutoIncrementSeed = 1, Unique = true, AllowDBNull = false };
DataColumn personSurnameId = new DataColumn(“SurnameId”, typeof(int)) { AllowDBNull = false };
DataColumn fatherId = new DataColumn(“FatherID”, typeof(int));
DataColumn motherId = new DataColumn(“MotherID”, typeof(int));

personTable.Columns.AddRange(new DataColumn[]
{

personId,
new DataColumn(“Name”, typeof(string)) { AllowDBNull = false},
personSurnameId,
new DataColumn(“YearOfBirth”, typeof(int)),
new DataColumn(“MonthOfBirth”, typeof(int)),
new DataColumn(“DayOfBirth”, typeof(int)),
new DataColumn(“Description”, typeof(string)),
fatherId,
motherId

});
personTable.PrimaryKey = new DataColumn[] { personId };
myDb.Tables.Add(personTable);

DataTable surnameTable = new DataTable(“Surname”);
surnameTable.Columns.AddRange(new DataColumn[]
{

surnameId,
new DataColumn(“Surname”, typeof(string)) { Unique=true, AllowDBNull=false }

});
surnameTable.PrimaryKey = new DataColumn[] { surnameId };
myDb.Tables.Add(surnameTable);
DataRelation personSurname = new DataRelation(“PersonSurname”, surnameId, personSurnameId);
myDb.Relations.Add(personSurname);
DataRelation personFather = new DataRelation(“PersonFather”, personId, fatherId);
myDb.Relations.Add(personFather);
DataRelation personMother = new DataRelation(“PersonMother”, personId, motherId);
myDb.Relations.Add(personMother);

The piece of code above shows a couple of things (which probably includes bad programming habits as well… but it is only an example 😉 ). First it creates a DataSet with two tables. The two tables have a relationship between them that will enforce some referential integrity. Both tables have an id field that is an autonumber/identity column. The ‘Person’ table also adds two internal references (parent-child relationship). No rocket science here (yet).

The next block demonstrates how to add raw data to the DataSet.

DataRow newSurname = surnameTable.NewRow();
newSurname[“Surname”] = “Van Der Merwe”;
surnameTable.Rows.Add(newSurname);

newSurname = surnameTable.NewRow();
newSurname[“Surname”] = “Botha”;
surnameTable.Rows.Add(newSurname);

DataRow newRow = personTable.NewRow();
newRow[“Name”] = “Piet”;
newRow[“SurnameId”] = 1;
newRow[“Description”] = “Testing 123”;
personTable.Rows.Add(newRow);

newRow = personTable.NewRow();
newRow[“Name”] = “Sannie”;
newRow[“SurnameId”] = 2;
newRow[“YearOfBirth”] = 1950;
personTable.Rows.Add(newRow);

newRow = personTable.NewRow();
newRow[“Name”] = “Koos”;
newRow[“SurnameId”] = 1;
newRow[“YearOfBirth”] = 1980;
newRow[“FatherID”] = 1;
newRow[“MotherID”] = 2;
personTable.Rows.Add(newRow);

Again, no rocket science here. Two surnames are added and then 3 people. The third person reference the first two (parents).

Then if you want  to ‘persist’ the DataSet:

if (saveFileDialogXml.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{

if (System.IO.File.Exists(saveFileDialogXml.FileName))

System.IO.File.Delete(saveFileDialogXml.FileName);

myDb.WriteXml(saveFileDialogXml.FileName, XmlWriteMode.WriteSchema);

Then lastly you might want to reload it back into another DataSet to add some more data etc.

DataSet myDb2 = new DataSet();
myDb2.ReadXml(saveFileDialogXml.FileName);

personTable = myDb2.Tables[“Person”];
newRow = personTable.NewRow();
newRow[“Name”] = “Gert”;
newRow[“SurnameId”] = 1;
personTable.Rows.Add(newRow);

So there you have the basics how a simple .Net DataSet can be used to create, load, persist, reload and change data to and from an Xml file. Personally I’d like to add another step to compress (zip) the data while on disk – to make it easier to transport if needed. This will not impact functionality or performance of the running application.

One concern is of course what happens when the ‘database’ grows big because if you haven’t noticed, the entire database must be loaded/saved when you interact with the Xml file. This concept will not allow you to use extremely large databases – but then it is not intended for mission critical enterprise database systems! On the positive side – the database is ‘in memory’ so it should be very fast. Splitting the file and loading only sections will make it very tricky and possibly mean you have to wrote a lot of custom code.

It is also possible to use a Typed DataSet in exactly the same manner. The advantage then is you don’t have to define the ‘database structure’ in code. The rest works basically the same.

XML file as a database – part 1

I’ve been toying with the idea to get alternative for a ‘local database’ for a standalone or even embedded database. In the past I used Jet a lot but since Microsoft is abandoning it (like some many other things including being successful… 😉 ) I started looking for alternatives. Lately I’ve been using SQL Compact a lot. It is actually nice to use and because of it SQL server roots developing for it isn’t that difficult. However, it still has one drawback that most other ‘real’ database systems have – it requires a separate install which must also be maintained separately – think about software and security updates here.

I’ve been working on a genealogical database for an uncle of mine and the current database is still an MDB file. One of the reasons moving away from this format is that Jet is not supported on 64-bit anymore – and Microsoft is not making any effort (nor would I expect it) to port the Jet database drivers to support 64-bit. Then, even the new format for Access requires installing office. This brings me to another reason I want to steer away from installing ‘separate’ things just to support a database – he (uncle) like so many other ‘normal’ computer users have no idea how to manage or maintain a database system especially if things goes wrong (and they have a habit to with normal users using them – I wonder why…). So what could I use to suit these requirements??

In the past I have considered using a plain xml file as a data store but abandoned the idea for various reasons. Things like xml’s ‘bloatness’ or lack of real database features counted against it. Now I find myself again re-looking at this but perhaps with different expectations. I don’t think using an xml file as a general database would be a good idea but for a scenario like mine it might just work – perhaps.

The question is of course how to implement this database idea. An xml file is just that – a plain text file that  has no functionality by itself. There are existing database systems that are based on xml – see nxd databases. However, none of the well established ones really support .Net as is. On the other side I don’t want to go and develop a whole new database management by myself. Perhaps there is a middle\hybrid way of doing it. From past experience I know that .Net DataSets can persist to an XML file and internally it can be seen as a mini database with support for real data types, indexing, relationships, default values including things like autonumbers etc. Transactionality is not required as the system will be single user, single instance based. Some downsides are that xml is very bloated, you cannot load just part of the data – it is either load everything or nothing and you have to be careful with which encoding you use (because genealogical data usually contain non-ascii characters 🙂 ).

Thus, I’m in the process of investigating how feasible it is to use either a plain or typed DataSet as the base of a data layer for an application. The application itself will still use normal ‘domain’ class structures to handle data internally but the data access layer will rather interface with the DataSet. I thought about just using the domain classes and serialization to persists them straight to (xml) file but then simple functionality provided by the DataSet objects would have to be coded manually (and I’m too lazy now or not in tha mood).

So stay tuned while this idea develops.