Upload files

Transfer client¶

The ITransferClient is the entry point for operations related to files or existing datasets. You can construct it manually, but typically you will use a DI container to inject an instance of the class. In the following examples we will assume that the variable transferClient holds a reference to an instance of ITransferClient.

Transfer process¶

Each operation that imports or downloads data is represented by a transfer. Transfer is a process that runs somewhere in the cloud, and may potentially run for a long time. Therefore its lifecycle is initiated at some point by one request to the cloud platform, and then its status has to be tracked by polling or other mechanisms. At the end of its run, the transfer will provide its result - either a new dataset in case of an import, or a url where the requested file can be downloaded

How to upload plain files¶

The most basic operation is uploading a simple file, so that it resides in the cloud platform as-is (as a physical file).

var transferResult = await transferClient
    .CreateFileImport(projectId, "C:\\Data\\EqD2.dfs2")
    .ExecuteAndWaitAsync();

var newDatasetId = transferResult.DatasetId;

In this example a file under C:\Data\EqD2.dfs2 will be uploaded to the cloud, and then an import process will be triggered to place that uploaded file under a project (or a folder) represented by projectId. The ExecuteAndWaitAsync method will not only start the whole process, but it will also wait until it finishes. As a consequence the result object contains the dataset id. At this point the file is imported in the cloud.

This approach could work well for smaller files, but if the file is large, it may not be desirable to wait for the transfer to finish. In that case it may be easier to simply execute the transer without waiting for its finish. The transfer should still be monitored to see if it succeeds or ends with error.

var transferResult = await transferClient
    .CreateFileImport(projectId, "C:\\Data\\EqD2.dfs2")
    .ExecuteAsync();

var transferId = transferResult.TransferId;
...
// loop until transferOutput.Status == TransferOutputStatus.Completed (or Error)
var transferOutput = await transferClient.GetTransferAsync(transferId);
...

Staging big files before import¶

When dealing with big files, or large number of files, it might be beneficial to first physically upload the files into a remote location (so-called staging area), and then invoke the transfer process using the file in that location. Separating the physical upload of the file from the invocation of the transfer process may help create more robust solution.

The files can be staged in any location that is then accessible from the internet. E.g. any storage that provides urls for downloading files will do. You can stage the files in such location using your preferred methods (ftp upload, SDKs etc.). Additionally the cloud platform provides its own staging storage that you can use. The files uploaded there are considered temporary only and will be removed after a certain period of time. So in case of using this staging area do invoke the transfer from the uploaded files within reasonable time frame.

var url = await transferClient.StageFileAsync(@"c:\path\to\file.dfs2");

Console.WriteLine(url); // this url can be used as an input for the transfers

You can modify upload settings for the method StageFileAsync by providing an optional parameter UploadDownloadOptions, where you can specify the block size and parallelism used for the upload based on what your hosting environment allows.

How to upload plain files from remote source¶

In case your file is represented by a remote url, similar principles apply as for the local files, but you need to initialize the transfer with a different source - a url rather than a local file.

var transferResult = await transferClient
    .CreateUrlImport(projectId, "http://dhi.org/sample.dfs2", "my sample dataset", "sample.dfs2")
    .ExecuteAndWaitAsync();

var newDatasetId = transferResult.DatasetId;

With the CreateUrlImport method we need to specify the destination projectId as before, the url of the data, but also the name of the new dataset and the name representing the original "file". This is because these names may not be distinguishable from the url, and the transfer process needs those to function correctly.

Modify upload settings¶

The physical upload of a file or a stream to the cloud has some settings that may influence the duration of the upload process. When there are enough resources available (internet connectivity, CPU time) then it may be beneficial to leverage parallelization of the upload. When the internet connectivity is poor, it might be better to use different settings. The upload settings are an optional parameter for each method that initiates a file/stream based transfer. By default the settings are set to moderate values of 2 worker threads and 20MB block size. If your connectivity is strong, feel free to increase both values.

var uploadOptions = new UploadDownloadOptions
{
    MaxParallelism = 8,         // use 8 threads to upload
    BlockSize = 128*1024*1024   // block size per request
}

var transferResult = await transferClient
    .CreateFileImport(projectId, "C:\\Data\\EqD2.dfs2", uploadOptions)
    .ExecuteAsync();

...

How to upload files with processing¶

The input file does not have be stored in its original form - there are some processing options applicable during the transfer proces. They are generally unlocked by selecting an appropriate Reader and Writer (as mentioned here).

To select a reader or writer, the transfer process object created by CreateFileImport offers flient API such as WithReader and WithWriter. There are also Transformations (described here), which can be applied by using WithTransformation method.

Upload a file into Multidimensional storage¶

Many files can be imported not as plain files, but as queryable data sources. Think of it as a difference between copying a .csv file to a different place in the file system (plain file import), and importing the same file into MS Excel, which understands its structure and allows you to work with it.

One of the storages available in the platfom is the Multidimensional storage (described here). In order to upload a file into Multidimensional storage, a reader has to be selected according to the file type, and an MDWriter has to be specified for the transfer, like this:

var transferResult = await transferClient
    .CreateFileImport(projectId, "C:\\Data\\EqD2.dfs2")
    .WithReader(new Dfs2Reader())
    .WithWriter(new MDWriter())
    .ExecuteAndWaitAsync();

var newDatasetId = transferResult.DatasetId;

In this case the file is a Dfs2, and because it is a well-known DHI file type, a strongly typed Dfs2Reader class is provided for convenience of use.

Not every available reader from the platform is represented in the SDK in the form of a typed class.

Construct a reader not available in the SDK¶

We can start by browsing the platform documention with the available readers, or calling a method in the SDK that returns available readers:

var readers = await transferClient.GetReadersAsync();
foreach (var reader in readers)
{
    Console.WriteLine(reader.Name);
}
// prints:
// Dfs0Reader
// Dfs1Reader
// Dfs2Reader
// ...
// VtuReader
// ...

If we want to import a VTU file and we need to process the contents of the file (for example to project the data into a different coordinate system), we need to use the VtuReader. However, by inspecting the DHI.Platform.SDK.Clients.Transfers.Readers namespace, we find there is no VtuReader class there. The reader in the platform can still be used, but we need to construct the reader manually.

The GenericReader class in the SDK serves as a baseline reader that can be instantiated with a string name. This name should correspond to the name of the reader that you see as available in the platform API.

Similar class is available for writers.

var transferResult = await transferClient
    .CreateFileImport(projectId, "C:\\Data\\bathymetry.vtu")
    .WithReader(new GenericReader("VtuReader")) 
    .ExecuteAndWaitAsync();

var newDatasetId = transferResult.DatasetId;

Apply transformation¶

Transformation can be defined on the transfer process object using WithTransformation method. There are also extensions in namespace DHI.Platform.SDK.Clients.Transfers.Transformations that aim to simplify usage of the common transformations.

One of the frequent use cases is a change of coordinate system. The transformation that does this is called CrsTransformation. This code would read the input as a dfs2 file, change the coordinate system to Web Mercator (epsg:3857), and write the output as a VTU file.

Note that the dataset name is also changed to bathymetry.vtu, because otherwise it would assume the original file name (bathymetry.dfs2) as the dataset name, which might be confusing, since the data will be converted to VTU.

var transferResult = await transferClient
    .CreateFileImport(projectId, "C:\\Data\\bathymetry.dfs2")
    .WithDatasetName("bathymetry.vtu")
    .WithReader(new Dfs2Reader())
    .WithWriter(new GenericWriter("VtuWriter")) 
    .WithTransformation(new CrsTransformation { OutputSrid = 3857 })
    .ExecuteAndWaitAsync();

var newDatasetId = transferResult.DatasetId;

Note that the same can be achieved with a WithCoordinateSystemTransformation extension like this:

var transferResult = await transferClient
    .CreateFileImport(projectId, "C:\\Data\\bathymetry.dfs2")
    .WithDatasetName("bathymetry.vtu")
    .WithReader(new Dfs2Reader())
    .WithWriter(new GenericWriter("VtuWriter")) 
    .WithCoordinateSystemTransformation(3857)
    .ExecuteAndWaitAsync();

var newDatasetId = transferResult.DatasetId;

Sometimes the uploaded file does not include spatial reference definition and it is necessary to expicitely specify which spatial reference is used. Suppose we want to upload a GeoJson file to GIS service. Coordinates in the GeoJson file are in spatial reference 4329. During the import, we also want to transform the coordinates to spatial refrence 3035. The following lines will do the trick. The example also shows how reader parameters can be specified.

var readerParameters = new ParameterInput[] { new ParameterInput{ Name = "SRID", Value = 4329 }};

var transferResult = await transferClient
    .CreateFileImport(projectId, "C:\\Data\\foo.json")
    .WithDatasetName("My Features")
    .WithReader(new GenericReader("GeoJsonReader", readerParameters))
    .WithWriter(new GenericWriter("GISWriter")) 
    .WithTransformation(new CrsTransformation { OutputSrid = 3035 })
    .ExecuteAndWaitAsync();

var newDatasetId = transferResult.DatasetId;

Update a dataset¶

Some datasets (such as Multidimensional datasets) allow updating the data with new information (timesteps). For these situations there are methods for initiating an update.

// first transfer to setup the dataset
var transferResult = await transferClient
    .CreateFileImport(projectId, "file-day-1.dfs2")
    .WithReader(new Dfs2Reader())
    .WithWriter(new GenericWriter("MDWriter"))
    .ExecuteAndWaitAsync();

// second transfer to add more timesteps
await transferClient
    .CreateDatasetUpdateFromFile(projectId, transferResult.DatasetId, "file-day-2.dfs2")
    .WithReader(new Dfs2Reader())
    .ExecuteAndWaitAsync();

There are similar methods for initiating an update from remote urls or in-process streams. Note that you do not have to specify a writer - because that information is inferred from the type of the dataset that is being updated.