SDS library facilitates data integration into an application; the same API can be used to work with different data sources. But if you try to run an application, using SDS, on a machine where the SDS package is not installed, it will fail. The reason is that the SDS infrastructure cannot find core assemblies and assemblies required to support particular data source. The latter assemblies contain types called DataSet Providers. Each provider implements physical data access routines and has an associated name and possibly extension(s). For example, a NetCDFDataSet class is a provider for the NetCDF data format; it is associated with provider name "nc" and file extension ".nc". This enables the DataSetFactory to properly create a DataSet instance when you call the code as follows:

// Full syntax:
DataSet ds = DataSet.Open("msds:nc?");
// Short syntax:
DataSet ds = DataSet.Open("");

Usually each DataSet provider type is defined in a separate assembly; for example, the NetCDFDataSet class is defined in the Microsoft.Research.Science.Data.NetCDF4.dll. There are a number of assemblies with DataSet providers, shipped as a part of the SDS installation package. This makes it possible for a programmer to create DataSets for different supported data sources without care about the assemblies.

In some cases an application, using SDS, should be run on a machine where SDS is not installed. For example, you have no administrative rights to install it, or there is no point for that, like in a case when you need to run your application on a thousand of computational nodes of a cluster. In these cases, you should help SDS infrastructure to find and register required providers.

SDS allows to specify providers in a configuration file. At least a name of a provider and full name of the provider's type must be specified; also it is possible to associate one or several file extensions with the provider. For this you should (a) properly prepare an application configuration file, and then (b) ship all required assemblies with the application.

The following steps enable an application to use certain DataSet provider on a machine without SDS installation. We will describe the process using NetCDF provider as an example; the steps are same for any other provider.

1. Prepare an application configuration file. A configuration file for a .NET application contains specific settings for the application, and the name of the configuration file is the name of the application with a .config extension. For example, name of a configuration file for a myApp.exe is myApp.exe.config. You can either add the configuration file into your Visual Studio project when developing an application, or just create corresponding *.config file for an application, or modify existing application configuration file.
  • Using Visual Studio to add a configuration file into your application. In a context menu for the project select command Add / New Item... and choose Application Configuration File. In the project the App.config item will appear. Visual Studio opens this file in a XML editor, so you can add required configuration as described below.
  • Creating configuration file for existing application. Just create a file with name myApp.exe.config, where myApp.exe is a name of the application executable file. This file must be properly formed XML document. An example of a file you can find here.
  • Modify existing configuration file. If an application already has a configuration file, you can edit it as described below.

2. Edit the configuration file. Once you have created or found an application configuration file, open it either in Visual Studio editor or any other XML/text editor. Sample configuration file is here.
  • Add definition of an SDS configuration section. (This should be done once, when you add first provider to this configuration file.)
    1. If there is no <configSections> element inside the root element <configuration>, add it.
    2. Add <section> element into <configSections> to register new configuration section with an attribute name equals ""; in a section with this name we will add providers specification later. Attribute type specifies the type which will parse the section; here we must provide full type name for the FactoryConfigurationSection class defined in the core assembly of the SDS: Microsoft.Research.Science.Data. Check the SDS version number: it must be an actual version number of the SDS you use. In the example we give a version number of the current CodePlex release: 1.2.6754.0. To find out which version you use, you can either run "sds info" in a command line, or see Help / About dialog in the DataSet Viewer on the machine where SDS is installed.
<?xml version="1.0" encoding="utf-8" ?>
  <!-- Declare configuration section named Microsoft.Research.Science.Data -->
	<section name="" 
type="Microsoft.Research.Science.Data.Factory.FactoryConfigurationSection, Microsoft.Research.Science.Data,
 Version=1.2.6754.0, Culture=neutral, PublicKeyToken=e550de0161496f12"/>
  <!-- Declarations of other config sections -->
  • Add section "" to the <configuration> element. (This step also must be performed only for the first provider.) It must contain child element <factories>, where we will register a provider in the next steps.
<?xml version="1.0" encoding="utf-8" ?>
  . . .
         <!-- Here we will register providers --> 
  • Register DataSet provider. For this, add element <add> inside the <factories> element of the SDS section. The name attribute is used in the DataSet URI when we open a dataset, e.g. name nc in "msds:nc"; the type attribute specifies assembly-qualified name of a provider type; this name includes full type name with namespace (e.g. Microsoft.Research.Science.Data.NetCDF4.NetCDFDataSet); assembly name, without extension (e.g. Microsoft.Research.Science.Data.NetCDF4); version number, culture and public key token for signed assemblies (public key token is same for all assemblies of the SDS release).
<add name="nc" type="Microsoft.Research.Science.Data.NetCDF4.NetCDFDataSet, Microsoft.Research.Science.Data.NetCDF4,
 Version=1.2.6754.0, Culture=neutral, PublicKeyToken=e550de0161496f12"/>
  • Associate file extension(s) with the provider, if required. For example, if *.nc is associated with the NetCDF provider, it is possible to open NetCDF files without "msds:nc" prefix, just using path to a file, e.g. "c:\". To associate an extension, add into <factories> new element <add> with an attribute ext=".xxx", where ".xxx" is the extension to the provider associate with. Second attribute, type, is same as in the previous step.
<add ext=".nc" type="Microsoft.Research.Science.Data.NetCDF4.NetCDFDataSet, Microsoft.Research.Science.Data.NetCDF4,
 Version=1.2.6754.0, Culture=neutral, PublicKeyToken=e550de0161496f12"/>

Note: elements <add> for the official SDS providers (such as NetCDF, CSV and WCF) are given in the following table. You can also take sample file here. Do not forget to check the version number.
Provider name <add> elements
wcf <add name="wcf" type="Microsoft.Research.Science.Data.Proxy.WCF.WcfDataSetFactory, Microsoft.Research.Science.Data.ServiceModel, Version=1.2.6754.0, Culture=neutral, PublicKeyToken=e550de0161496f12"/>
memory <add name="memory" type="Microsoft.Research.Science.Data.Memory.MemoryDataSet, Microsoft.Research.Science.Data.Memory, Version=1.2.6754.0, Culture=neutral, PublicKeyToken=e550de0161496f12"/>
csv <add name="csv" type="Microsoft.Research.Science.Data.CSV.CsvDataSet, Microsoft.Research.Science.Data.CSV, Version=1.2.6754.0, Culture=neutral, PublicKeyToken=e550de0161496f12"/>
.csv <add ext=".csv" type="Microsoft.Research.Science.Data.CSV.CsvDataSet, Microsoft.Research.Science.Data.CSV, Version=1.2.6754.0, Culture=neutral, PublicKeyToken=e550de0161496f12"/>
nc <add name="nc" type="Microsoft.Research.Science.Data.NetCDF4.NetCDFDataSet, Microsoft.Research.Science.Data.NetCDF4, Version=1.2.6754.0, Culture=neutral, PublicKeyToken=e550de0161496f12"/>
.nc <add ext=".nc" type="Microsoft.Research.Science.Data.NetCDF4.NetCDFDataSet, Microsoft.Research.Science.Data.NetCDF4, Version=1.2.6754.0, Culture=neutral, PublicKeyToken=e550de0161496f12"/>
memory2 <add name="memory2" type="Microsoft.Research.Science.Data.Memory2.ChunkedMemoryDataSet, Microsoft.Research.Science.Data.Memory2, Version=1.2.6754.0, Culture=neutral, PublicKeyToken=e550de0161496f12"/>
as <add name="as" type="Microsoft.Research.Science.Data.ActiveStorage.ASGridDataSet, Microsoft.Research.Science.Data.ActiveStorage, Version=1.2.6754.0, Culture=neutral, PublicKeyToken=e550de0161496f12"/>

3. Ship the assemblies. To use SDS, an application needs core SDS assemblies and providers. When the application runs and starts opening a DataSet, the DataSet Factory will look up the configuration file and try to load specified providers. Configuration contains names of assemblies with providers; core and provider assemblies must be located either side-by-side with the application executable file, or in GAC (or in locations given in the same configuration file using codebase or probing elements). In the simplest case, do the following steps:
  • Open %ProgramFiles%\Reference Assemblies\Microsoft\Research\Scientific DataSet 1.2 in the Windows Explorer on a machine with the SDS installed.
  • Copy the core assemblies listed below into the folder, where the application executable file is located.
Assembly When to copy
Microsoft.Research.Science.Data.dll Copy always
Microsoft.Research.Science.Data.Imperative.dll Required if application uses Imperative API
Microsoft.Research.Science.Data.ServiceModel.dll, Microsoft.Research.Science.Data.Pipeline.dll, Microsoft.Research.Science.Data.Utilities.dll, Microsoft.Ccr.Core.dll Required if application spawns viewer, opens shared dataset, creates proxy

  • For each provider that can be used in your application, copy provider's assembly into the folder, where the application executable file is located. If the provider assembly requires additional files, copy them to the same folder, too. Do not forget that some libraries may differ depending on a target platform, i.e. whether the machine is x86 or x64. List of required files for the official providers is given in the following table. (If a relative path is given, look in %ProgramFiles%\Reference Assemblies\Microsoft\Research\Scientific DataSet 1.2\).
Provider name Files to copy Comments
memory Microsoft.Research.Science.Data.Memory.dll
csv Microsoft.Research.Science.Data.CSV.dll
nc Microsoft.Research.Science.Data.NetCDF4.dll, netcdf4.dll There are two versions of the provider assemblies: for x86 and for x64. The appropriate files must be extracted from the GAC (see description under the table)
memory2 Microsoft.Research.Science.Data.NetCDF4.dll, SDSArrays.dll There are two versions of the provider assemblies: for x86 and for x64. The appropriate files must be extracted from the GAC (see description under the table)
wcf Microsoft.Research.Science.Data.ServiceModel.dll, Microsoft.Research.Science.Data.Pipeline.dll, Microsoft.Ccr.Core.dll

Now we will describe how to extract required libraries from GAC (this is required to get files for the NetCDF and Memory2 providers; here we will show the algorithm using NetCDF provider as an example).
  • Run command line (cmd.exe)
  • Change current directory to the GAC folder, like this (correct the path, if the Windows is installed in another folder):
cd c:\windows\assembly
  • Execute the command dir Microsoft.Research.Science.Data.NetCDF4.dll /s to find the provider assembly:
c:\Windows\assembly>dir Microsoft.Research.Science.Data.NetCDF4.dll /s
 Volume in drive C has no label.
 Volume Serial Number is 76BE-38C4

 Directory of c:\Windows\assembly\GAC_32\Microsoft.Research.Science.Data.NetCDF4\1.2.6754.0__e550de0161496f12

28.03.2011  22:09            64 512 Microsoft.Research.Science.Data.NetCDF4.dll
               1 File(s)         64 512 bytes
  • On x86 machine only one file will be found; on x64 machine there will be two files: in directories GAC_32 and GAC_64. You should select the one you need for your target machine. Copy all files of the directory where the file is found, into a target directory:
c:\Windows\assembly>copy c:\Windows\assembly\GAC_32\Microsoft.Research.Science.Data.NetCDF4\1.2.6754.0__e550de0161496f12 c:\temp

        2 file(s) copied.

Two required files must be copied: Microsoft.Research.Science.Data.NetCDF4.dll and netcdf4.dll.

That's all. When the application configuration file is prepared, and all required files are copied into the application folder, the application is ready to run and use SDS.

Last edited Jun 24, 2011 at 4:41 PM by dvoits, version 43


No comments yet.