Developer guidelines for transfer and conversion
Overview¶
The role of transfer pipeline is to provide following operations:
- Import - imports file uploaded into staging storage (blob storage/ftp) into Raw storage or dedicated storage
- Export - exports dataset from Raw storage or dedicated storage into staging storage in form of file format (currently only blob storage)
- Copy - copies existing dataset from Raw storage or dedicated storage into Raw storage or dedicated storage
- Append - Appends new data from file uploaded into staging storage (blob storage/ftp) to existing dataset in dedicated storage (no replace of old data is allowed)
- Update - Updates existing dataset in dedicated storage with new data from file uploaded into staging storage (blob storage/ftp)
Each operation relies on concept of reader and writer:
- Reader - Reads a dataset of given format and returns an object model.
- Writer - Writes an object model to the dataset of given format.
Transfer pipeline operates upon two main data formats:
- File format - Is file format that has no specific data persistence mechanism and it's handled by Transfer pipeline by default, which eventually communicates data storage with Raw service or staging storage.
- Dedicated format - Is any format which has its own mechanism of data persistence and store its data in its own dedicated storage (e.g. Postgres for GIS Vector storage, Blobstorage for Multidimensional Storage).
Workflow¶
Each operation runs a following workflow which routes the user request to the implementation of format readers and format writers:
- Particular endpoint is invoked on Metadata service.
- Metadata service contract is mapped by Automapper to corresponding Input/Output parameters of Transfer batch, where implementation of
InputParameters
defines a relation to a format reader (+ storage in case of file formats) and implementation ofOutputParameters
defines a relation to a format writer (+ storage in case of file formats). - Based on the value of
ReaderName
/WriterName
from Metadata service contract correspondingFormatDefinition
is created on the Input/Output parameters. Values ofFormatDefinition
are given byconversion.**.json
metadata definitions stored underDHI.WaterData.Metadata.Domain
.FormatDefinition
specifiesPackageName
(package from where the corresponding readers and writers are dynamically loaded) andFormatName
(relates to theTransferFormat
attribute of loaded readers and writers) - New Transfer is run with specified parameters.
- Based on the value of
FormatDefinition
from theInputParameters
and the type ofInputParameters
itself Transfer will dynamically load corresponding Reader implementation with matchingTransferFormat
attribute. - Weak parameters from the Input parameters are mapped to the implementation of
IReaderParameters
. - Reader is run with reader parameters and returns a representation of dataset model.
- Based on the value of
FormatDefinition
from theOutputParameters
and the type ofOutputParameters
itself Transfer will dynamically load corresponding Writer implementation with matchingTransferFormat
attribute. - Weak parameters from the Output parameters are mapped to the implementation of
IWriterParameters
. - Model returned from Reader together with writer parameters are passed to the the writer, which returns either: a) implementation of
IConversionResult
in case of dedicated formats; b)DatasetFile
representing stored temporary file created by the writer in case of file formats. In case of file formats conversion result is created internally by the transfer pipeline. - Conversion result is send back to the Metadata service, which based on the operation performs the final processing (Creates a new dataset, updates a dataset, etc.)
Following table summarizes overview about different contracts/interfaces across the pipeline. (When entry is separated by /
, different values are applicable for file formats/dedicated formats. X means not supported.)
Operation | Endpoint | Metadata contracts | Transfer contracts - InputParameters |
Transfer contracts - OutputParameters |
Reader interface | Writer interface | IReaderParameters |
IWriterParameters |
Writer output |
---|---|---|---|---|---|---|---|---|---|
Import | api/conversion/upload-convert |
ConvertUploadInput |
UploadStorageParameters |
RawStorageOutputParameters /DedicatedStorageImportOutputParameters |
IFileFormatReader<TParams, TModel> |
IFileFormatWriter<TParams, TModel> /IDedicatedStorageFormatWriter<TParams, TModel> |
IReaderParameters |
IWriterParameters /IImportWriterParameters |
DatasetFile /ImportOutputs |
Export | api/metadata/dataset/{id}/download-convert |
ConvertDownloadInput |
RawStorageInputParameters /DedicatedStorageInputParameters |
DownloadStorageParameters |
IFileFormatReader<TParams, TModel> /IDedicatedStorageFormatReader<TParams, TModel> |
IFileFormatWriter<TParams, TModel> |
IReaderParameters /IExportReaderParameters |
IWriterParameters |
DatasetFile /ExportOutputs |
Copy | api/metadata/dataset/{id}/convert |
ConvertExistingInput |
RawStorageInputParameters /DedicatedStorageInputParameters |
RawStorageOutputParameters /DedicatedStorageImportOutputParameters |
IFileFormatReader<TParams, TModel> /IDedicatedStorageFormatReader<TParams, TModel> |
IFileFormatWriter<TParams, TModel> /IDedicatedStorageFormatWriter<TParams, TModel> |
IReaderParameters /IExportReaderParameters |
IWriterParameters /IImportWriterParameters |
DatasetFile /ImportOutputs |
Update | api/metadata/dataset/{id}/update |
ConvertUpdateInput |
X/UploadStorageParameters |
X/DedicatedStorageUpdateOutputParameters |
X/IFileFormatWriter<TParams, TModel> |
X/IDedicatedStorageFormatWriter<TParams, TModel> |
X/IReaderParameters |
X/IUpdateWriterParameters |
X/UpdateConversionResult |
Append | api/metadata/dataset/{id}/append |
ConvertAppendInput |
X/UploadStorageParameters |
X/DedicatedStorageAppendOutputParameters |
X/IFileFormatWriter<TParams, TModel> |
X/IDedicatedStorageFormatWriter<TParams, TModel> |
X/IReaderParameters |
X/IAppendWriterParameters |
X/AppendConversionResult |
Cross-cutting concerns¶
Transfer pipeline provides a way of managing cross-cutting concerns in form of decorating reader/ writer with common decorators. Currently we have following decorators:
MergeFileFormatReader
- When reading files from a folder into model implementingIMergeableModel
interface, it will ensure merging multiple files into one model.MetadataFileFormatWriter
- Extracts metadata from a model and adds them to the conversion result (for file formats only).MetadataDedicatedFormatWriter
- Extracts a metadata from a model and add them to the conversion result (for dedicated formats only).
Besides these decorators pipeline has another mechanism in form of implementing IConversionProcessor<TParameters, TSubject>
interface.
Following overview specifies the point in the pipeline workflow, where the processors are plugged in:
File formats:¶
IConversionProcessor<IReaderParameters, DatasetFile>
: before reading of a file into a modelIConversionProcessor<IReaderParameters, TModel>
: after reading of a file into a modelIConversionProcessor<IWriterParameters, TModel>
: before writing of a model into a fileIConversionProcessor<IWriterParameters, DatasetFile>
: after writing of a model into a file
Dedicated formats:¶
IConversionProcessor<IReaderParameters, DedicatedReaderEmptySubject>
: before reading of a format into a modelIConversionProcessor<IReaderParameters, TModel>
: after reading of a format into a modelIConversionProcessor<IWriterParameters, TModel>
: before writing of a model into a formatIConversionProcessor<IWriterParameters, IConversionResult>
: after writing of a model into a format
Currently the following processors are available:
LoggerProcessor
: Logs the state of pipeline process.ModelConversionProcessor
: Processes the model based on model transformations like filtering, value transformation, etcValidatorProcessor
: Validates the reader/writer parameters.UnzippingProcessor
: Unzips theDatasetFile
input representing a zip file and passes a newDatasetFile
to the reader as the representation of unzipped folder.ZippingProcessor
: Zips theDatasetFile
output from the writer representing the folder and returns a newDatasetFile
representing a zip file.
When reading a file format returned model is always wrapped inside of FileModel
, which provides access to the original file and to the model being read. Reason for this is that we have special File
format/package. The writer of this package writes the file in the original form, but at the same time it takes advantage of extracting metadata of the dataset.
Quick start¶
If you want to create a new reader or writer, you should follow these guidelines:
- Create a new .csproj/.sln for transfer package under the
.\Transfer\
folder if it doesn't exist yet. - Based on whether you want to read or write a format and based on required operation implement appropriate reader/writer interface and reader/writer parameters chosen from the overview table above.
- Mark these implementations with
TransferFormat
attribute. Its value must be unique across the same operation. E.g. we can't have multiple readers inside the package marked with the sameTransferFormat
attribute, but we can have one reader, one writer, one update writer and one append writer marked with the sameTransferFormat
attribute. This is useful for operation inference because once a format is written, than user no longer has to specify format again during update, append or export as the proper reader or writer is automatically chosen based on the correlated value ofTransferFormat
attribute. - Implement
IModule
interface which configures DI container. - Register your reader/writer in
conversion.**.json
metadata definitions stored underDHI.WaterData.Metadata.Domain
. Neither Metadata service or Transfer batch has no reference to the transfer packages. Packages are loaded dynamically, thereforeconversion.**.json
files gather the configuration providing the way of matchingReaderName
/WriterName
from the user request to the corresponding transfer package and reader/writer. - Create a new
.yaml
CI pipeline stored under.\Scripts\Yaml\Transfer\
.
Example:
Let's say we'll come up with a new dedicated format reader, which stores FeatureClass
model representation in the MS-SQL storage besides the already existing POSTGIS storage. We have to create the following implementations inside of new .\Transfer\GIS-SQL
package:
[TransferFormat("GIS-SQL")]
public class GisSqlReader : IDedicatedStorageFormatReader<GisSqlReaderParameters, FeatureClass>
{
public Task<FeatureClass> Read(GisSqlReaderParameters parameters)
{
throw new NotImplementedException();
}
}
[TransferFormat("GIS-SQL")]
public class GisSqlReaderParameters : IExportReaderParameters
{
public Guid ExportedDatasetId { get; set; }
}
public class GisSqlModule : IModule
{
public void ConfigureServices(IServiceCollection services)
{
services.AddSingleton<GisSqlReader>();
}
}
conversion.dedicated.formats.json
:
{
"transferFormat": "GIS-SQL",
"transferPackage": "GIS-SQL"
}
conversion.dedicated.readers.json
:
{
"name": "GISSQLReader",
"description": "Reads data from MS-SQL GIS storage",
"writers": [ "GISWriter", "ShpWriter", "GeoJsonWriter" ],
"format": "GIS-SQL"
}