Skip to content

Custom Importers

Importer Types

The RecordM system comes with default importers for each definition. These importers allow users to import data via an excel file, where each line is mapped to an instance and each column is mapped to a field. No additional processing is made and the columns must match with the field names exactly.

This type of importers is enough in many cases. However custom importers are necessary once this simple mapping is no longer sufficient, and we need extra intermediary steps, either to change how our data is mapped and processed, or simply to set up one of the alternative data sources - for example http endpoints. Custom importers are particularly useful when dealing with scenarios where:

  • Additional fields need to be filled that are not present in the source data.
  • Complex calculations or logic are required during the import process.
  • Data needs to be pulled from multiple sources or external systems.
  • We want to map data to a different field than the one that was defined in the imported excel file.
  • Dates need to be remapped to a different format.

Custom importers can be divided based on the source of the data that is being imported - Excel files, databases, HTTP endpoints, and CSV files

Besides importing, importers can also be used to update existing data. If there is an identifier in the data, and the importer is set to allow updates, then it will first try to find an instance with that identifier and if so, update it.

Data Sources

As stated previously, there are a few possible data sources which dictate the way the importers behave:

  • Excel Files:

    • Excel Importers expect to receive an excel file. This excel file is parsed and computed, such that any formulas are calculated before importing the data.
  • Databases:

    • Database Importers performs an SQL query to a database, where the returned list of entries is imported.
  • HTTP Endpoints:

    • HTTP Importers perform GET requests to specific URIs, where the list of results in the response of each request is imported.

Configuration Files

Custom importers extend the capabilities of regular importers by introducing additional logic and processing into the import process. This custom logic is specified in different files depending on when it happens and what transformations are applied.

In order to set up a custom importer, a directory and one or more additional files and are necessary. All the described steps need to be done in the recordm-importer/ directory:

  • create a new folder, with the name of your importer. For example, an importer for invoices might be called invoice-importer. This directory will hold all the configuration files for the importer.

  • create a recordm-importer.properties (mandatory) file

    • In this file we will specify the details and configurations of our importer. The most efficient way is to copy the template and uncomment/fill the necessary lines.
    • For scenarios such as remapping a field in our excel to another field in the RecordM system or changing the date format of our imported file automatically, this file alone would be sufficient.
  • create a body.groovy file (optional)

    • in this file we can create fields not present in the original data (but that exist in the definition to import), perform additional processing on our imported file, such as queries to RecordM, database, calculations, etc.., or even import instances into more than one definition.
  • create a uri.groovy file (optional, for http importers)

    • in this file we can customize which URIs will be used by the importer to perform the GET operation. This file should return a list of strings (which contain the URIs) and may use any logic. This file is optional in HTTP, but if it does not exist, then the property http.source-uri (in the recordm-importer.properties file) must be filled!

See the final chapter of use cases and examples to know if you will need just a properties file or more for your case.

Complex Processing

The body.groovy file is where the most complex calculations can happen. It has the following properties:

  • It is a script that executes for each record that is being imported.
  • A variable request (of type Message) is injected into the script, which represents the recordm being processed.
  • It must return a dictionary for the instance to create (with the format [ fieldName : value] ) or a list of dictionaries (in this case, each dictionary will create a new instance)

The returned dictionary(ies) can contain a specific field called COB_DefinitionToImport. This field allows us to import instances to different definitions. It can be different from the definition specified in the recordm-importer.properties file. You may set this value to __DISCARD__ to discard the returned instance.

The request object has two main fields:

  • The most relevant one is body (accessed as request.body) which contains the actual data being imported. So if the record being imported has a field called cost it can be accessed in the script via request.body.get("cost"). The values here have the same typing as the imported values - so numbers are numbers and strings are strings.
  • In addition, there is the field headers which only useful in very specific situations. In HTTP Importers, it contains the following fields: 1) CamelHttpUri which holds the whole URI from which the record being processed was GET'd. 2) CamelHttpQuery which holds the query parameters of the URI.

TIP

When there are not alot of differences between the received request.body and the final result, it is common to directly alter this dictionary with request.body.put and return it at the end of the script. This is because 'extra fields' present in the body but not in the definition are ignored, so as long as the ones we want are correct, we can directly update it.

Order of Execution

To better visualize how these files work together in the custom importer scenario, here is the common flux of an imported file through the custom importers configuration files:

TIP

The uri.groovy and query.sql files are specific to HTTP and database importers, respectively, therefore, they are never going to be simultaneously present in the import.

When any file is not present, the data skips over to the next step. For example, when only the recordm-importer.properties exists, the records to import (from whichever source) go directly to it, suffering any re-mappings/transformations it might apply.

Updating Entries

Importers can also be used to update instances instead of just creating them. This is useful when re-importing data, which may have changed.

When a new record has a primary key - either explicitly defined in the properties file (more below in the examples) or via an ID column in the data (an ID column is implicitly assumed as a primary key, unless stated otherwise in the properties) - the importer will try to find whether that record already exists in the system. It does this via an Elastic Search query. If it finds 0, then it will create an instance. If it finds 1, then it will update it. If it finds 2 or more instances with the same primary key, it aborts.

If we are sure that data is not meant to change, then we can turn off updates via the property. This will mean that if the system finds 1 instance already present in RecordM, it will do nothing and move on.

#default is false 
ignoreUpdates=true

If the primary key is not set in any way (and there are no secondary keys) then the importer will create an instance for all the records.

Use Cases / Examples

Through only the use of the recordm-importer.properties file we are able to achieve a few simple use cases:

  • modifying date time values to a different format automatically.
  • changing the behaviour of the $ref fields.
  • updating which field the data mapped in the excel field actually goes to.
  • updating primary and secondary keys for updating data

However once we need more complex use cases such as:

  • additional logic for calculation of values based on imported data.
  • queries to additional RecordM definitions, or external databases.
  • creating instances of multiple different definitions.
  • creating instances of different definitions that reference eachother.

For these cases, we will need a body.groovy file to be setup, since these situations with additional complexity cannot be achieved through the use of the properties file alone.

recordm-importer.properties Examples

Example - Custom Importer for modifying date format

In the recordm-importer.properties file:

##OPTIONAL: The format of the date fields with hours in the .xlsx files

datetime.format=MM-dd-yyyy hh:mm:ss i

date.format=MM-dd-yyyy

In order to modify the final format of the date fields in data to import (regardless of the importer type) we would modify this line to match our desired format. This transformation is applied when a field with the $datetime or $date keyword receives a string value - it is ignored if the value is already a valid timestamp. This formatting applies to all fields of this kind in the imported record - we cannot specify different formats for different fields.

Example - Custom Importer for mapping values to field with a different name

In the recordm-importer.properties file:

##OPTIONAL: The field names that correspond to the given xlsx file column names, for each Definition on the property "definitions.withKeyFields"

#definition.Definition_1.fieldsTranslation=Column 1->Field 1,Column 2->Field 2,...

#definition.Definition_2.fieldsTranslation=Column X->Field X,Column Y->Field Y,...

In order to map values to a field with a different name in the RecordM System, we would add a line to this section of the file with the described format. This takes the name of the field in the imported record and maps it to the described field in the final imported instance. For example, take the definition Countries which has a field Long Name and the imported record only had a Name, we would add the following line:

definition.Countries.fieldsTranslation=Name->Long Name

Example - Custom Importer for changing $ref field behaviour

In the recordm-importer.properties file:

##OPTIONAL: mapping of what referenced field should be used to search for a given $ref field. Defaults to $instanceLabel

definition.Definition.refFieldsSearchTarget=User->Nome

Normally, the $ref field expects to be filled by the $instanceLabel of the referenced definition. However when importing data, we might know this label, but have other possible means of referencing. For example, if we were importing the following records into the Countries Series definition, which references the definition Countries:

Short NameYearIndicator NameValue
Arab World1/1/2014Alternative and nuclear energy (% of total energy use)0.41
Caribbean small states1/1/2014Surface area (sq. km)434,990
East Asia & Pacific1/1/2014Surface area (sq. km)24,825,074.5

We do not know the Country Code - the instance label of the Country definition - so to properly fill the $ref in the generated instance, we would need to add the following line, telling the importer to use the Country's Short Name when referenced.

definition.Countries_Series.refFieldsSearchTarget=Country Code->Short Name 
# Country Code = $ref field in the Countries Series definition
# Short Name = Field in the Countries definition to use to find the reference

TIP

If the column we are using as a new search value for a $ref field is NOT UNIQUE, the importer will abort if it finds more than one instance for that referenced field.

Example - Custom Importer for changing primary and secondary keys

In the recordm-importer.properties file:

##OPTIONAL: The names of the key fields (primary and secondary) for each Definition on the property "definitions.withKeyFields"

#definition.Definition_1.primaryKeyFields=Field 1,Field 2,...

#definition.Definition_1.secondaryKeyFields=Field 3,Field 4,...

Primary and secondary keys serve to identify whether the record being imported already exists in RecordM. It will first use the fields specified in the primary key fields to try and find an instance. If those fields are not filled, it will instead try to use the secondary key fields.

For example, if we had a field in the definition Countries that stored its current population and from time to time we imported data from an external API to update it, that API would not identify its countries with our IDs. Let's suppose it only identified them with the country code. The most efficient way of making sure we updated the correct instance would be to indicate the country code as the correct primary key.

definition.Countries.primaryKeyFields=Country Code

body.groovy Examples

Example - Custom Importer performing calculations for existing fields

Sometimes the data returned is not enough to fill the instance, or simple re-mapping is not enough. Such transformations usually require the existence of a body.groovy script. These can range in complexity, from very complex and requiring multiple calculations to simple math. For example, in the definition Countries, suppose the value for "Alternative and nuclear energy (% of total energy use)" was stored as a percentage from 0 to 1, instead of 0 to 100, in the original data. As such, we would want to multiply it by 100, when the country series is of this kind.

For this, we could have the following body.groovy

if( request.body.get("Indicator Code") == "EG.USE.COMM.CL.ZS" ) { // code for alternative energy percentage
		def percentage = request.body.get("Value") * 100 
		request.body.put("Value", percentage)
	}
	
	return request.body

Example - Custom Importer performing a fetch for extra details

It is not uncommon that listing APIs return only part of the information - such as an overview and most important fields - while the detailed view of the data must be obtained through a specific access to the API. For example, if the listing of returned only the ids of countries and not the whole name, or its details. To acomplish this, we would need to perform a request for each object and there are two ways to go about it.

  1. We perform the first query to obtain the different ids in the uri.groovy file, and return the necessary IDs as URIs to fetch.
  2. We perform the API request to get the listing, and treat each id as the instance to import, using the body.groovy script to fetch the detailed data.

TIP

We'll be using Javax Client to perform the requests as this package is available in the uri.groovy and body.groovy scripts. We are assuming the response is returned as a list of objects such as { "code" : <code> }

Option 1.
// uri.groovy

def jsonSlurper = new JsonSlurper(type: JsonParserType.INDEX_OVERLAY);

def response = ClientBuilder.newClient().target("https://url-to-API/countries")
				.request()
        .buildGet()
        .submit()
        .get();
				
def countries = jsonSlurper.parseText(response.readEntity(String.class))
def urls = countries.collect { country -> "https://url-to-API/countries/${country.code}" }
return urls
Option 2.
// body.groovy

def jsonSlurper = new JsonSlurper(type: JsonParserType.INDEX_OVERLAY);
def code = request.body.get("code")

def response = ClientBuilder.newClient().target("https://url-to-API/countries/${code}")
				.request()
        .buildGet()
        .submit()
        .get();
				
def country = jsonSlurper.parseText(response.readEntity(String.class))

// insert whatever calculations you want

return country

Example - Custom Importer creating multiple definitions of different instances referencing each other

Sometimes, a single imported record generates more than one instance. For example, take the definitions Countries and Countries Series. Suppose we were importing records for the first time, and the records looked like this:

{
	"Code" : <code>,
	"Name" : <name>,
	"Alternative and nuclear energy" : <0-100>,
	"Population": <number>,
	"Area" : <number>, 
	"Last Update" : 2015-1-1
}

Following our definition structure, this would generate 4 instances: 1 Country and 3 Country Series (one for each kind). As such what we need to do is return these 4 instances as a result of body.groovy, using COB_DefinitionToImport to specify where to import each of them, and their respective instance labels to reference each other.

// body.groovy

def country = [  "COB_DefinitionToImport" : "Countries",
												"Country Code" : request.body.get("Code"),
												"Short Name" : request.body.get("Name")  ]

def population = [ "COB_DefinitionToImport" : "Countries Series",
														"Country Code" : request.body.get("Code"),
														"Value" : request.body.get("Population"), 
														"Year": request.body.get("Last Update") 
														"Indicator Name" : "Population, total"   ]
														
def area = [ "COB_DefinitionToImport" : "Countries Series", 
									"Country Code" : request.body.get("Code"), 
									"Value" : request.body.get("Area"), 
									"Year": request.body.get("Last Update") 
									"Indicator Name" : "Surface area (sq. km)" ]
									
def energy  = [ "COB_DefinitionToImport" : "Countries Series", 
											"Country Code" : request.body.get("Code"), 
											"Value" : request.body.get("Alternative and nuclear energy"), 
											"Year": request.body.get("Last Update")    
											"Indicator Name" : "Alternative and nuclear energy (% of total energy use)" ]
											
return [ country, population, area, energy ]

This works due to the Country Code being the $instanceLabel of the Countries definition, which means we can use it in the Countries Series instances to reference it. The order of the results is also important - the country must be the first element, so that it is created first and the importer can find it when importer the other instances. The order however is not enough, because the importer uses Elastic Search to find the referenced instance, we must also add the following property (in recordm-importer.properties) to guarantee that it waits for created instances to be indexed before adding more.

es.sync-refresh.on-create = true

We also need to specify the date format

date.format=yyyy-MM-dd