Scientific DataSet (SDS) is a managed library for reading, writing and sharing array-oriented scientific data: time series, matrices, satellite/medical imagery and multidimensional numerical grids. The library is optimized to handle data in the form of arrays, e.g. time series and tables, vectors and matrices, multidimensional grids. SDS bundles several related arrays and associated metadata in a single self-descriptive package and enforces certain constraints on arrays' shapes to ensure data consistency. An underlying data model of the SDS is based on a long-term community experience. The SDS Data Model has commonality to the Unidata’s Common Data Model, which was chosen because CDM has been successfully tested by time. The model is widely spread among scientists working with data and is quite simple.

The idea of the Scientific DataSet is to provide a single data model with implementations for multiple specific data formats. Applications are able to store and retrieve data uniformly having an abstract view on various custom data storages. This makes an application less dependent on data formats and significantly eases data transfer between software components.

Scientific DataSet features:
  • Rich metadata to create self-descriptive data packages.
  • Support for multiple data formats that are popular in this area such as NetCDF.
  • The ability to scale out from simple text files to multi-terabyte Microsoft Azure archives.
  • Concurrent access to the data from multiple computing agents in multicore and distributed settings.
  • The ability to perform consistency checks and transactional updates.
Using SDS in your computational program gives you the following advantages:
  • Your program is more interoperable. It can import/export data in different formats.
  • Your program is more scalable. It can seamlessly switch from the readable text files that are useful in small scale experiments and debugging to high performance binary data formats in production mode.
  • Your program can immediately become part of a sophisticated concurrent data flow system.
  • It is easy to visualize results of your program using DataSet Viewer.
An extensible set of dynamically loadable providers allows you to choose from different storage formats and different data access mechanisms. For example, depending on the DataSet URI parameter supplied, different runs of the same program canread or write data differently using text files in CSV format, binary NetCDF files or other format/communication mechanism.

The Scientific DataSet package includes the following components:
If you need to run an application that uses SDS, on a machine where no SDS installed, see this topic.

Last edited Jun 24, 2011 at 2:13 PM by dvoits, version 18

Comments

ror Jun 4, 2013 at 9:46 PM 
Hi,

let me help others by points some misleading information presented above:

1. Package installed from here (codeplex) is outdated and doesn't provide anything but command line tool. Documentation in same package is misleading too. You better go here and download: http://research.microsoft.com/en-us/projects/sds/default.aspx

2. Also note that download link is restricted ftp server, with download speed of 3 kbps! Yep that's even slow for '90s dial-up. So if you don't want to wait couple of hours for 14MB file, copy the link and put it something like Internet download manager or so and finish this unnecessarily obfuscated task