Developer guidelines for transfer and conversion
Overview¶
The role of transfer pipeline is to provide following operations:
- Import - imports file uploaded into staging storage (blob storage/ftp) into Raw storage or dedicated storage
- Export - exports dataset from Raw storage or dedicated storage into staging storage in form of file format (currently only blob storage)
- Copy - copies existing dataset from Raw storage or dedicated storage into Raw storage or dedicated storage
- Append - Appends new data from file uploaded into staging storage (blob storage/ftp) to existing dataset in dedicated storage (no replace of old data is allowed)
- Update - Updates existing dataset in dedicated storage with new data from file uploaded into staging storage (blob storage/ftp)
Each operation relies on concept of reader and writer:
- Reader - Reads a dataset of given format and returns an object model.
- Writer - Writes an object model to the dataset of given format.
Transfer pipeline operates upon two main data formats:
- File format - Is file format that has no specific data persistence mechanism and it's handled by Transfer pipeline by default, which eventually communicates data storage with Raw service or staging storage.
- Dedicated format - Is any format which has its own mechanism of data persistence and store its data in its own dedicated storage (e.g. Postgres for GIS Vector storage, Blobstorage for Multidimensional Storage).
Workflow¶
Each operation runs a following workflow which routes the user request to the implementation of format readers and format writers:
- Particular endpoint is invoked on Metadata service.
- Metadata service contract is mapped by Automapper to corresponding Input/Output parameters of Transfer batch, where implementation of
InputParametersdefines a relation to a format reader (+ storage in case of file formats) and implementation ofOutputParametersdefines a relation to a format writer (+ storage in case of file formats). - Based on the value of
ReaderName/WriterNamefrom Metadata service contract correspondingFormatDefinitionis created on the Input/Output parameters. Values ofFormatDefinitionare given byconversion.**.jsonmetadata definitions stored underDHI.WaterData.Metadata.Domain.FormatDefinitionspecifiesPackageName(package from where the corresponding readers and writers are dynamically loaded) andFormatName(relates to theTransferFormatattribute of loaded readers and writers) - New Transfer is run with specified parameters.
- Based on the value of
FormatDefinitionfrom theInputParametersand the type ofInputParametersitself Transfer will dynamically load corresponding Reader implementation with matchingTransferFormatattribute. - Weak parameters from the Input parameters are mapped to the implementation of
IReaderParameters. - Reader is run with reader parameters and returns a representation of dataset model.
- Based on the value of
FormatDefinitionfrom theOutputParametersand the type ofOutputParametersitself Transfer will dynamically load corresponding Writer implementation with matchingTransferFormatattribute. - Weak parameters from the Output parameters are mapped to the implementation of
IWriterParameters. - Model returned from Reader together with writer parameters are passed to the the writer, which returns either: a) implementation of
IConversionResultin case of dedicated formats; b)DatasetFilerepresenting stored temporary file created by the writer in case of file formats. In case of file formats conversion result is created internally by the transfer pipeline. - Conversion result is send back to the Metadata service, which based on the operation performs the final processing (Creates a new dataset, updates a dataset, etc.)
Following table summarizes overview about different contracts/interfaces across the pipeline. (When entry is separated by /, different values are applicable for file formats/dedicated formats. X means not supported.)
| Operation | Endpoint | Metadata contracts | Transfer contracts - InputParameters |
Transfer contracts - OutputParameters |
Reader interface | Writer interface | IReaderParameters |
IWriterParameters |
Writer output |
|---|---|---|---|---|---|---|---|---|---|
| Import | api/conversion/upload-convert |
ConvertUploadInput |
UploadStorageParameters |
RawStorageOutputParameters/DedicatedStorageImportOutputParameters |
IFileFormatReader<TParams, TModel> |
IFileFormatWriter<TParams, TModel>/IDedicatedStorageFormatWriter<TParams, TModel> |
IReaderParameters |
IWriterParameters/IImportWriterParameters |
DatasetFile/ImportOutputs |
| Export | api/metadata/dataset/{id}/download-convert |
ConvertDownloadInput |
RawStorageInputParameters/DedicatedStorageInputParameters |
DownloadStorageParameters |
IFileFormatReader<TParams, TModel>/IDedicatedStorageFormatReader<TParams, TModel> |
IFileFormatWriter<TParams, TModel> |
IReaderParameters/IExportReaderParameters |
IWriterParameters |
DatasetFile/ExportOutputs |
| Copy | api/metadata/dataset/{id}/convert |
ConvertExistingInput |
RawStorageInputParameters/DedicatedStorageInputParameters |
RawStorageOutputParameters/DedicatedStorageImportOutputParameters |
IFileFormatReader<TParams, TModel>/IDedicatedStorageFormatReader<TParams, TModel> |
IFileFormatWriter<TParams, TModel>/IDedicatedStorageFormatWriter<TParams, TModel> |
IReaderParameters/IExportReaderParameters |
IWriterParameters/IImportWriterParameters |
DatasetFile/ImportOutputs |
| Update | api/metadata/dataset/{id}/update |
ConvertUpdateInput |
X/UploadStorageParameters |
X/DedicatedStorageUpdateOutputParameters |
X/IFileFormatWriter<TParams, TModel> |
X/IDedicatedStorageFormatWriter<TParams, TModel> |
X/IReaderParameters |
X/IUpdateWriterParameters |
X/UpdateConversionResult |
| Append | api/metadata/dataset/{id}/append |
ConvertAppendInput |
X/UploadStorageParameters |
X/DedicatedStorageAppendOutputParameters |
X/IFileFormatWriter<TParams, TModel> |
X/IDedicatedStorageFormatWriter<TParams, TModel> |
X/IReaderParameters |
X/IAppendWriterParameters |
X/AppendConversionResult |
Cross-cutting concerns¶
Transfer pipeline provides a way of managing cross-cutting concerns in form of decorating reader/ writer with common decorators. Currently we have following decorators:
MergeFileFormatReader- When reading files from a folder into model implementingIMergeableModelinterface, it will ensure merging multiple files into one model.MetadataFileFormatWriter- Extracts metadata from a model and adds them to the conversion result (for file formats only).MetadataDedicatedFormatWriter- Extracts a metadata from a model and add them to the conversion result (for dedicated formats only).
Besides these decorators pipeline has another mechanism in form of implementing IConversionProcessor<TParameters, TSubject> interface.
Following overview specifies the point in the pipeline workflow, where the processors are plugged in:
File formats:¶
IConversionProcessor<IReaderParameters, DatasetFile>: before reading of a file into a modelIConversionProcessor<IReaderParameters, TModel>: after reading of a file into a modelIConversionProcessor<IWriterParameters, TModel>: before writing of a model into a fileIConversionProcessor<IWriterParameters, DatasetFile>: after writing of a model into a file
Dedicated formats:¶
IConversionProcessor<IReaderParameters, DedicatedReaderEmptySubject>: before reading of a format into a modelIConversionProcessor<IReaderParameters, TModel>: after reading of a format into a modelIConversionProcessor<IWriterParameters, TModel>: before writing of a model into a formatIConversionProcessor<IWriterParameters, IConversionResult>: after writing of a model into a format
Currently the following processors are available:
LoggerProcessor: Logs the state of pipeline process.ModelConversionProcessor: Processes the model based on model transformations like filtering, value transformation, etcValidatorProcessor: Validates the reader/writer parameters.UnzippingProcessor: Unzips theDatasetFileinput representing a zip file and passes a newDatasetFileto the reader as the representation of unzipped folder.ZippingProcessor: Zips theDatasetFileoutput from the writer representing the folder and returns a newDatasetFilerepresenting a zip file.
When reading a file format returned model is always wrapped inside of FileModel, which provides access to the original file and to the model being read. Reason for this is that we have special File format/package. The writer of this package writes the file in the original form, but at the same time it takes advantage of extracting metadata of the dataset.
Quick start¶
If you want to create a new reader or writer, you should follow these guidelines:
- Create a new .csproj/.sln for transfer package under the
.\Transfer\folder if it doesn't exist yet. - Based on whether you want to read or write a format and based on required operation implement appropriate reader/writer interface and reader/writer parameters chosen from the overview table above.
- Mark these implementations with
TransferFormatattribute. Its value must be unique across the same operation. E.g. we can't have multiple readers inside the package marked with the sameTransferFormatattribute, but we can have one reader, one writer, one update writer and one append writer marked with the sameTransferFormatattribute. This is useful for operation inference because once a format is written, than user no longer has to specify format again during update, append or export as the proper reader or writer is automatically chosen based on the correlated value ofTransferFormatattribute. - Implement
IModuleinterface which configures DI container. - Register your reader/writer in
conversion.**.jsonmetadata definitions stored underDHI.WaterData.Metadata.Domain. Neither Metadata service or Transfer batch has no reference to the transfer packages. Packages are loaded dynamically, thereforeconversion.**.jsonfiles gather the configuration providing the way of matchingReaderName/WriterNamefrom the user request to the corresponding transfer package and reader/writer. - Create a new
.yamlCI pipeline stored under.\Scripts\Yaml\Transfer\.
Example:
Let's say we'll come up with a new dedicated format reader, which stores FeatureClass model representation in the MS-SQL storage besides the already existing POSTGIS storage. We have to create the following implementations inside of new .\Transfer\GIS-SQL package:
[TransferFormat("GIS-SQL")]
public class GisSqlReader : IDedicatedStorageFormatReader<GisSqlReaderParameters, FeatureClass>
{
public Task<FeatureClass> Read(GisSqlReaderParameters parameters)
{
throw new NotImplementedException();
}
}
[TransferFormat("GIS-SQL")]
public class GisSqlReaderParameters : IExportReaderParameters
{
public Guid ExportedDatasetId { get; set; }
}
public class GisSqlModule : IModule
{
public void ConfigureServices(IServiceCollection services)
{
services.AddSingleton<GisSqlReader>();
}
}
conversion.dedicated.formats.json:
{
"transferFormat": "GIS-SQL",
"transferPackage": "GIS-SQL"
}
conversion.dedicated.readers.json:
{
"name": "GISSQLReader",
"description": "Reads data from MS-SQL GIS storage",
"writers": [ "GISWriter", "ShpWriter", "GeoJsonWriter" ],
"format": "GIS-SQL"
}