Harvesting
What is a Data Harvester?
Harvesters import (harvest) datasets from remote sources into a CKAN instance. This is accomplished by mapping the data from this remote source into CKAN fields.
CKAN uses the (ckanext-harvest) extension, which provides an interface for building custom harvesters. This interface has three stages:
- The gather stage compiles all the resource identifiers that need to be fetched in the next stage.
- The fetch stage gets the contents of the remote objects and stores them in the database.
- The import stage performs any necessary actions on the fetched resource.
Specific documentation about the CKAN Harvester can be found at CKAN Remote harvesting extension documentation
Types of Harvesters
Dataplatform provides the following options to harvest data:
- CKAN Harvester
- JSON DCAT Harvester
- Generic DCAT RDF Harvester
CKAN Harvester
- The CKAN Harvester is an example of a custom harvester. This harvester lets you import data from a remote CKAN instance into your own CKAN instance.
JSON DCAT Harvester
The JSON DCAT Harvester is a plugin that is part of the DCAT-extension ckanext-dcat. This harvester lets you import JSON objects based on DCAT mapping fields and maps these DCAT metadata fields to the CKAN fields.
Go to CKAN documentation JSON DCAT Harvester
Generic DCAT RDF Harvester
- This Harvester is another custom harvester which is part of the DCAT-extension.