This project is read-only.
The DataSet is a top-level class of the Scientific DataSet library that provides a single data model for different scientific datasets. Applications can store and retrieve data uniformly, having an abstract view on various custom data storage formats. This makes an application less dependent on data formats and significantly eases data transfer between software components.

The DataSet bundles several related arrays and associated metadata in a single self-descriptive package and enforces certain constraints on arrays' shapes to ensure data consistency. The underlying data model has much in common with Unidata’s Common Data Model, which has been successfully tested by time.

To create or open a dataset, use the DataSet.Open(String) method with a proper DataSet URI:

Example
DataSet dataSet = DataSet.Open("csv_for_autotypes.csv"); 
DataSet dataSet = DataSet.Open("msds:csv?file=csv_for_autotypes.csv"); 
DataSet dataSet = DataSet.Open(@"c:\data\ncfile.nc"); 
DataSet dataSet = DataSet.Open("msds:nc?file=ncfile.nc"); 
DataSet dataSet = DataSet.Open("msds:as?server=(local)&database=ActiveStorage" + 
                       "&integrated security=true&GroupName=mm5&UseNetcdfConventions=true");

The URI indicates the particular DataSet implementation, which provides access to the specified data storage format. Such an implementation is called a DataSet provider.

For a list of providers included in the current release, see Managed Libraries.

The URI can include additional provider-specific information, such as security credentials, behavioral properties, and so on. For more information, see the documentation for each provider. To customize a URI you open a DataSet, use properties exposed by the DataSetUri class. For details, see the DataSetUri.Create(string) method in complete reference included in the installation package.

The DataSet consists of a collection of Variable objects. Each Variable represents a single array with a collection of attributes attached. For information about attributes, see the Variable.Metadata property. If a provider works with large datasets, it typically does not load all data into memory, but provides on-demand data access through variables. See also Supported Types.

Example
The following code iterates through all variables of a DataSet loaded from the "sample.csv" file and displays information about the DataSet on the console:
using(DataSet ds = new DataSet.Open("sample.csv"))
{
     Console.WriteLine ("Scientfic DataSet " + ds.Name + "(" + ds.DataSetGuid + ")" + " contents: ");

     Console.WriteLine (" Variables:");
     foreach (Variable v in sds.Variables)
     {
         Console.WriteLine (v.ToString());
     }
}

Relationships between variables are expressed using shared dimensions. A DataSet dimension is an index space that has a distinct name. For example, if variable Observation shares a dimension with variable X, then Observation[i] relates somehow to X[i] for all indices i from the shared index space. Additionally this introduces a constraint on the DataSet:
  • Two variables that share an index space must always be the same size along the shared dimension.

In addition to attaching metadata to each Variable, you can attach global metadata to the DataSet. Metadata is a dictionary of string keys and typed values, represented as a MetadataDictionary class.

The DataSet instance can be read-only. Such a dataset cannot be modified. For details, see DataSet.IsReadOnly property.

New Variable can be added to a DataSet by using AddVariable<DataType>() methods.

Example
The following code adds a column to the text file "Tutorial1.csv", which is in comma-separated values (CSV) format:
using(DataSet ds = DataSet.Open("Tutorial1.csv"))
{
    // read input data
    double[] x = (double[])ds["X"].GetData();
    double[] y = (double[])ds["Observation"].GetData();
    // compute
    var xm = x.Sum() / x.Length;
    var ym = y.Sum() / y.Length;
    var a = x.Zip(y, (xx, yy) => (xx - xm) * (yy - ym)).Sum()
     / x.Select(xx => (xx - xm) * (xx - xm)).Sum();
    var b = ym - a * xm;
    var model = x.Select(xx => a * xx + b).ToArray();
    // Adding new variable
    ds.AddVariable<double>("Model", model);
}

The extensions DataSetExtensions enable you to work with DataSet in a way similar to a procedural API. You could rewrite the previous example using the extensions as follows:

Example
using(DataSet ds = sds.DataSet.Open("Tutorial1.csv"))
{
    // read input data 
    var x = ds.GetData<double[]>("X");
    var y = ds.GetData<double[]>("Observation");
    // compute
    var xm = x.Sum() / x.Length;
    var ym = y.Sum() / y.Length;
    var a = x.Zip(y, (xx, yy) => (xx - xm) * (yy - ym)).Sum()
      / x.Select(xx => (xx - xm) * (xx - xm)).Sum();
    var b = ym - a * xm;
    var model = x.Select(xx => a * xx + b).ToArray();
    // Adding new variable
    ds.Add<double[]>("Model");
    ds.PutData<double[]>("Model", model);
}

The DataSet uses a two-phase transaction mechanism to commit changes. For details, see the DataSet.Commit() method and the DataSet.IsAutocommitEnabled property.

See also How-To topic.

Last edited Jul 2, 2010 at 4:05 PM by dvoits, version 5

Comments

No comments yet.