SDS append data to ncdf file using C#

Feb 5, 2013 at 2:34 PM
Hi, I have a huge file needs to convert to ncdf file using sds.
Instead of read all the data into an array in RAM, then write the array into ncdf, I'd like to add the data record by record using append() function to ncdf file.
I have tried with a few records, there is no error pops up; but the data seems not added into the ncdf file.
Below is the sample code I am working with.

Anyone can help?
Thanks.

//create the ncdf file
var ds_data= sds.DataSet.Open("X:\data\dsnc.nc?openMode=create");

//add "name" variable to ds_data
ds_data.Add<string[]>("name");

//declare an array, and put one record in
string[] a = new string[1];
a[0] = "hello";

//append the record to " name " variable
ds_data["name"].Append(a);
Developer
Feb 6, 2013 at 6:52 AM
Hi,

If you just need to convert files from one file format to another, I recommend you to use the sds.exe utility, which is a part of the SDS installation package.

See documentation here, section Copying DataSet. All you need is to run cmd.exe and type
sds copy originalfile.csv?openMode=open destination.nc?openMode=create&enableRollback=false
The utility will do conversion by small chunks.


If you need to perform some operation on large data during copying, you really need to copy the file by chunks, otherwise it will go out of memory. In general, the algorithm is implemented in sds.exe, see sources here, the file Main/src/sdsutil/DataSetCloning.cs.


If you need implementation just for a particular known data structure, you may do this in more simple way, fetching known variables and appending them into the output dataset, but you should be aware of dataset consistency: dataset commits only when all variable have same length by shared dimensions (read more here). This might be a reason why the given sample doesn't work. Try to write the ds_data.ToString() - the uncommited variable will be marked with '*', see their lengths.


Regards,
Dmitry.
Feb 6, 2013 at 2:15 PM
Edited Feb 6, 2013 at 3:07 PM
Thanks so much for your reply, Dmitry.

I tried the SDS copy command, it seems the command will generate one variable for each column for my data in csv file.
However I need to have something more that, below is a sample of my genetic data.

RefSeq Pos A3 WI 92 P2 90
Gm01 8 G G . G G
Gm01 20 A A A A A
Gm01 70 G G G G G
Gm02 134 T T T T T
Gm02 195 C/T T T/C T T


In this data, I need 4 variables in my ncdf, including three 1-dimensional variables for columns "RefSeq", "pos" and column names from "A3" to "90", and one 2-dimensional variable for the data from [1,3]="G" to [5,7]="T". As the actual data has over 2 million rows and over 200 columns, I can't read it all to my memory to put them to different variables in one time. So I need to read the data chunk by chunk, say 2000 rows a time, and append them to the different variables in netcdf file. I was thinking use append() function in SDS, and it doesn't work for me so far. If you can point out the right grammar for this command , that will be great.

Also I have PutData(), both give me an error of "Array has wrong rank", can you explain why?

Thanks again.