Developer guidelines

Overview¶

The role of tranfer pipeline is to provide following operations:

Import - imports file uploaded into staging storage (blob storage/ftp) into Raw storage or dedicated storage
Export - exports dataset from Raw storage or dedicated storage into staging storage in form of file format (currently only blob storage)
Copy - copies existing dataset from Raw storage or dedicated storage into Raw storage or dedicated storage
Append - Appends new data from file uploaded into staging storage (blob storage/ftp) to existing dataset in dedicated storage (no replace of old data is allowed)
Update - Updates existing dataset in dedicated storage with new data from file uploaded into staging storage (blob storage/ftp)

Each operation relies on concept of reader and writer:

Reader - Reads a dataset of given format and returns an object model.
Writer - Writes an object model to the dataset of given format.

Transfer pipeline operates upon two main data formats:

File format - Is file format that has no specific data persistence mechanism and it's handled by Transfer pipeline by default, which eventually communicates data storage with Raw service or staging storage.
Dedicated format - Is any format which has its own mechanism of data persistence and store its data in its own dedicated storage (e.g. Postgres for GIS Vector storage, Blobstorage for Multidimensional Storage).

Workflow¶

Each operation runs a following workflow which routes the user request to the implementation of format readers and format writers:

Particular endpoint is invoked on Metadata service.
Metadata service contract is mapped by Automapper to corresponding Input/Output parameters of Transfer batch, where implementation of InputParameters defines a relation to a format reader (+ storage in case of file formats) and implementation of OutputParameters defines a relation to a format writer (+ storage in case of file formats).
Based on the value of ReaderName/WriterName from Metadata service contract corrresponding FormatDefinition is created on the Input/Output parameters. Values of FormatDefinition are given by conversion.**.json metadata definitions stored under DHI.WaterData.Metadata.Domain. FormatDefinition specifies PackageName (package from where the corresponding readers and writers are dynamically loaded) and FormatName (relates to the TransferFormat attribute of loaded readers and writers)
New Transfer is run with specified parameters.
Based on the value of FormatDefinition from the InputParameters and the type of InputParameters itself Transfer will dynamically load corresponding Reader implementation with matching TransferFormat attribute.
Weak parameters from the Input parameters are mapped to the implementation of IReaderParameters.
Reader is run with reader parameters and returns a representation of dataset model.
Based on the value of FormatDefinition from the OutputParameters and the type of OutputParameters itself Transfer will dynamically load corresponding Writer implementation with matching TransferFormat attribute.
Weak parameters from the Output parameters are mapped to the implementation of IWriterParameters.
Model returned from Reader together with writer parameters are passed to the the writer, which returns either: a) implementation of IConversionResult in case of dedicated formats; b) DatasetFile representing stored temporary file created by the writer in case of file formats. In case of file formats conversion result is created internally by the transfer pipeline.
Conversion result is send back to the Metadata service, which based on the operation performs the final processing (Creates a new dataset, updates a dataset, etc.)

Following table summarizes overview about different contracts/interfaces across the pipeline. (When entry is separated by /, different values are applicable for file formats/dedicated formats. X means not supported.)

Operation	Endpoint	Metadata contracts	Transfer contracts - `InputParameters`	Transfer contracts - `OutputParameters`	Reader interface	Writer interface	`IReaderParameters`	`IWriterParameters`	Writer output
Import	`api/conversion/upload-convert`	`ConvertUploadInput`	`UploadStorageParameters`	`RawStorageOutputParameters`/`DedicatedStorageImportOutputParameters`	`IFileFormatReader<TParams, TModel>`	`IFileFormatWriter<TParams, TModel>`/`IDedicatedStorageFormatWriter<TParams, TModel>`	`IReaderParameters`	`IWriterParameters`/`IImportWriterParameters`	`DatasetFile`/`ImportOutputs`
Export	`api/metadata/dataset/{id}/download-convert`	`ConvertDownloadInput`	`RawStorageInputParameters`/`DedicatedStorageInputParameters`	`DownloadStorageParameters`	`IFileFormatReader<TParams, TModel>`/`IDedicatedStorageFormatReader<TParams, TModel>`	`IFileFormatWriter<TParams, TModel>`	`IReaderParameters`/`IExportReaderParameters`	`IWriterParameters`	`DatasetFile`/`ExportOutputs`
Copy	`api/metadata/dataset/{id}/convert`	`ConvertExistingInput`	`RawStorageInputParameters`/`DedicatedStorageInputParameters`	`RawStorageOutputParameters`/`DedicatedStorageImportOutputParameters`	`IFileFormatReader<TParams, TModel>`/`IDedicatedStorageFormatReader<TParams, TModel>`	`IFileFormatWriter<TParams, TModel>`/`IDedicatedStorageFormatWriter<TParams, TModel>`	`IReaderParameters`/`IExportReaderParameters`	`IWriterParameters`/`IImportWriterParameters`	`DatasetFile`/`ImportOutputs`
Update	`api/metadata/dataset/{id}/update`	`ConvertUpdateInput`	X/`UploadStorageParameters`	X/`DedicatedStorageUpdateOutputParameters`	X/`IFileFormatWriter<TParams, TModel>`	X/`IDedicatedStorageFormatWriter<TParams, TModel>`	X/`IReaderParameters`	X/`IUpdateWriterParameters`	X/`UpdateConversionResult`
Append	`api/metadata/dataset/{id}/append`	`ConvertAppendInput`	X/`UploadStorageParameters`	X/`DedicatedStorageAppendOutputParameters`	X/`IFileFormatWriter<TParams, TModel>`	X/`IDedicatedStorageFormatWriter<TParams, TModel>`	X/`IReaderParameters`	X/`IAppendWriterParameters`	X/`AppendConversionResult`

Cross-cutting concerns¶

Transfer pipeline provides a way of managing cross-cutting concerns in form of decorating reader/ writer with common decorators. Currently we have following decorators:

MergeFileFormatReader - When reading files from a folder into model implementing IMergeableModel interface, it will ensure merging multiple files into one model.
MetadataFileFormatWriter - Extracts metadata from a model and adds them to the conversion result (for file formats only).
MetadataDedicatedFormatWriter - Extracts a metadata from a model and add them to the conversion result (for dedicated formats only).

Besides these decorators pipeline has another mechanism in form of implementing IConversionProcessor<TParameters, TSubject> interface.

Following overview specifies the point in the pipeline workflow, where the processors are plugged in:

File formats:¶

IConversionProcessor<IReaderParameters, DatasetFile>: before reading of a file into a model
IConversionProcessor<IReaderParameters, TModel>: after reading of a file into a model
IConversionProcessor<IWriterParameters, TModel>: before writing of a model into a file
IConversionProcessor<IWriterParameters, DatasetFile>: after writing of a model into a file

Dedicated formats:¶

IConversionProcessor<IReaderParameters, DedicatedReaderEmptySubject>: before reading of a format into a model
IConversionProcessor<IReaderParameters, TModel>: after reading of a format into a model
IConversionProcessor<IWriterParameters, TModel>: before writing of a model into a format
IConversionProcessor<IWriterParameters, IConversionResult>: after writing of a model into a format

Currently the following processors are available:

LoggerProcessor: Logs the state of pipeline process.
ModelConversionProcessor: Processes the model based on model transformations like filtering, value transformation, etc
ValidatorProcessor: Validates the reader/writer parameters.
UnzippingProcessor: Unzips the DatasetFile input representing a zip file and passes a new DatasetFile to the reader as the representation of unzipped folder.
ZippingProcessor: Zips the DatasetFile output from the writer representing the folder and returns a new DatasetFile representing a zip file.

When reading a file format returned model is always wrapped inside of FileModel, which provides access to the original file and to the model being read. Reason for this is that we have special File format/package. The writer of this package writes the file in the original form, but at the same time it takes advantage of extracting metadata of the dataset.

Quick start¶

If you want to create a new reader or writer, you should follow these guidelines:

Create a new .csproj/.sln for transfer package under the .\Transfer\ folder if it doesn't exist yet.
Based on whether you want to read or write a format and based on required operation implement appropriate reader/writer interface and reader/writer parameters choosen from the overview table above.
Mark these implementations with TransferFormat attribute. Its value must be unique across the same opearation. E.g. we can't have multiple readers inside the package marked with the same TransferFormat attribute, but we can have one reader, one writer, one update writer and one append writer marked with the same TransferFormat attribute. This is usefull for opearation inference because once a format is written, than user no longer has to specify format again during update, append or export as the proper reader or writer is automatically chosen based on the correlated value of TransferFormat attribute.
Implement IModule interface which configures DI container.
Register your reader/writer in conversion.**.json metadata definitions stored under DHI.WaterData.Metadata.Domain. Neither Metadata service or Transfer batch has no reference to the transfer packages. Packages are loaded dynamically, therefore conversion.**.json files gather the configuration providing the way of matching ReaderName/WriterName from the user request to the corresponding transfer package and reader/writer.
Create a new .yaml CI pipeline stored under .\Scripts\Yaml\Transfer\.

Example: Let's say we'll come up with a new dedicated format reader, which stores FeatureClass model representation in the MS-SQL storage besides the already existing POSTGIS storage. We have to create the following implementations inside of new .\Transfer\GIS-SQL package:

    [TransferFormat("GIS-SQL")]
    public class GisSqlReader : IDedicatedStorageFormatReader<GisSqlReaderParameters, FeatureClass>
    {
        public Task<FeatureClass> Read(GisSqlReaderParameters parameters)
        {
            throw new NotImplementedException();
        }
    }

    [TransferFormat("GIS-SQL")]
    public class GisSqlReaderParameters : IExportReaderParameters
    {
        public Guid ExportedDatasetId { get; set; }
    }

To configure our DI container we have to implement module:

    public class GisSqlModule : IModule
    {
        public void ConfigureServices(IServiceCollection services)
        {
            services.AddSingleton<GisSqlReader>();
        }
    }

We have to also add a new section in the conversion.dedicated.formats.json:

    {
        "transferFormat": "GIS-SQL",
        "transferPackage": "GIS-SQL"
    }

And add a new section in the conversion.dedicated.readers.json:

    {
        "name": "GISSQLReader",
        "description": "Reads data from MS-SQL GIS storage",
        "writers": [ "GISWriter", "ShpWriter", "GeoJsonWriter" ],
        "format": "GIS-SQL"
    }