Skip to main content
Version: 4.10.1

Getting Started

Note: This tutorial assumes background knowledge of what type of sources are compliant with the three harvester options in CKAN

This step-by-step tutorial will show you how to set up one of three types of harvester options available in CKAN.

We will demonstrate the set up of a harvester using the JSON DCAT Harvester. Setting up this harvester will allow you to harvest data from JSON objects based on DCAT metadata fields into your CKAN instance.

Some examples of JSON DCAT sources:

Example: Where to start

To begin setup of a harvester in CKAN, you need to add /harvest to the URL of your CKAN instance.

For example:
 https://YOUR-CKAN-PORTAL-NAME/harvest

On this /harvest page, you will see an overview of harvester sources that are currently set up. On this page you can add additional harvesters.

1. Adding a Harvester

Click on the Add Harvest Resource button Add Harvest Source button image

2. Configure a Harvesting Source

After clicking the Add Harvest Resource button, you need to complete the following fields:

  • URL (mandatory)

    The URL contains the actual harvesting source. In our example we will use a JSON file which consists of DCAT metadata for two datasets. Each dataset in our example contains multiple different file types. The example JSON file that is used can be viewed here: JSON DCAT file.

    Harvester URL field

  • Title (mandatory)

    The title that is provided will be used in the URL name of the harvester. Harvester title field

  • Description

    Provide an appropriate description for your harvester. Harvester URL textbox

  • Update frequency

    When frequency is set to always, the harvester will reharvest as often as possible. In other words, this option lets the harvester run with the smallest time-interval possible.

    Update frequency options are: always, weekly, biweekly, or monthly. Harvester source update frequency

  • Custom configuration

    Custom configuration only accepts JSON objects. You may find an example of additional harvester configuration here.

    Custom configuration allows you to provide additional requirements or fields for the harvester to harvest. Harvester custom configuration

  • Select the appropriate organisation from a dropdown list

    Harvester organization

  • Save your configuration

    Click the blue save button - your harvester setup is now complete. Save button

3. Running Your Harvester

  1. Click the Admin button

    Harvester admin button

  2. Click the Reharvest button to initialize the harvester

    Reharvest button

  3. Click the Confirm button to begin the harvesting process

    Begin harvester process confirmation message and button

4. The Havesting Process

Depending on the source, the harvester process will take one to several minutes to finish

On the harvester's Admin page, you will see a tab labeled Dashboard. On this tab, you will see information about harvester's current or last completed process.

Last harvest job dashboard - process running

In our example, you can see that there are no notifications underneath Last Harvest Job. If the process has compeleted, you can click on the Dashboard tab again to refresh the page.

Last harvest job dashboard - process finished

Here we can see that the process is finished - In our example the page shows that two datasets were added.

5. View harvested datasets

  • Clicking the View Harvesting Source button will display datasets that we added during the harvesting process. View Harvest Source button

  • The harvested datasets are now displayed harvested datasets