I. Developing read-only DataSet provider

1. Add new class (e.g. ShapeDataSet) derived from the Microsoft.Research.Science.Data.DataSet class (located in the Microsoft.Research.Science.Data.dll).

using  Microsoft.Research.Science.Data;

namespace Shapes
    public class ShapeDataSet : DataSet

2. Add new class ShapeDataSetUri : DataSetUri.

It must have at least two constructors. One has no arguments and another accepts a URI string that should be passed to the base constructor.

Required URI parameters should be defined as the class properties. For example,

        [Description("Specifies the file to open or create.")]
        public string FileName
                if (ContainsParameter("file")) return "";
                return this["file"];
            set { this["file"] = value; }

The attribute FileNamePropertyAttribute should be applied to all properties standing for file names, as in the example above.

3. Add ShapeDataSet constructor accepting string:

For example,

  ShapeDataSet(string uri)

Here uri is either a string representation of DataSet Uri, or (if it is supported by the provider) path to files with parameters appended through the '?' symbol.

The base.uri field should be instantiated as an object of the URI type (e.g. ShapeDataSetUri) from the string and then use its properties to initalize the DataSet.

For example, it can look like the following code:

   if (DataSetUri.IsDataSetUri(uri))
                this.uri = new ShapeDataSetUri(uri);
                this.uri = ShapeDataSet.FromFileName(uri); // this method should create proper uri from file name
    DataSetUri.NormalizeFileNames(this.uri); // this method replaces file names with full paths

4. Implement abstract method that is a variable factory.

Since the provider is going to be read-only, it shouldn't create any new variables; therefore the CreateVariable method should be implemented exactly as follows:

protected override Variable<DataType> CreateVariable<DataType>(string varName, string[] dims)
    throw new InvalidOperationException("Cannot create new variable in this DataSet");

5. Add one or several new classes for variables of the provider.

Each variable can implement specific logic of data access. For example, a variable may keep all data in memory a initialize at once on DataSet creating. Another variable may on-demand load data from the underlying file.

Except for specific cases, it is convenient to derive the variable from the DataAccessVariable class. It is enough to override methods ReadData and ReadShape that should actually read data from file or take them from inner DataSet tables.

Important: at the end of the constructor it is required to invoke method base.Initialize().

6. Add new method to the DataSet that initializes inner state of the DataSet from the underlying file(s).

Call the method in the provider constructor (it is recommended to disable autocommit before this call and restore it after the call). If the DataSet is read-only, in the constructor after initialization invoke SetCompleteReadOnly method.

Initialization method should create new variables (of types created in 5) and add them to the DataSet collection through method DataSet.AddVariableToCollection. At the end, call Commit().

7. Add attributes specifying provider name, associating provider with DataSetUri type and file extensions.

For example,

    public class ShapeDataSet : DataSet

8. Metadata.

In the variable constructor, after base constructor finishes, the dictionary Variable.Metadata is available. Metadata attributes should be added to it:

Metadata[CsvDataSet.CsvColumnKeyName] = column.Index; // example from CsvVariable ctor

Also variable.Metadata can be altered in the DataSet initialization method (see 6). Global metadata - dataset.Metadata - can be initialized in the same method, too.

Last edited Nov 23, 2010 at 1:15 PM by dvoits, version 14


No comments yet.