Menu Close

Tips for Extracting Data Using the Qlik REST Connection

The Qlik REST connector is one of the more powerful tools when it comes to connecting to any Web resource. Pretty much any web database supports REST connections. As a Qlik developer, what are the tools at our disposal to develop these types of connections? Here are the steps I take to develop a Qlik REST connection.

Documentation

Every website that supports REST connections has documentation on how to connect. It will outline authentication (which sometimes requires secret generation), headers, whether a cookie/token must be passed, etc. Before attempting to connect to a website, read up on the documentation & find what endpoints need to be hit and what must be conserved in order to establish trust.

Tools

  • Web search
  • Site Developer’s documentation

Authentication

After reading up on the documentation it is time to make the connection. Depending on the security protocols in place, connecting will be more or less difficult. Sometimes a connection can be made using the Qlik REST connector & we can jump right into extracting without further hassle. However, even after reading the documentation, the site can offer challenges to connecting. Here are some things I have seen:

  • The site requires dynamic parameters
  • The site requires dynamic headers
  • The site requires passing a session-specific cookie or Token
  • A hashing algorithm is required

Except for the last point, all of these requirements are easily handled within Qlik.

My process for establishing a connection is to first try to execute it within Qlik. When that does not work, I move over to Postman so I can test quicker & save my requests (Postman does this automatically). Postman is also easier to inspect my request & the website’s response without having to go into the viewer as I would in Qlik. It also stores cookies in a place I can easily inspect them and even copy them into Qlik if desired.

Once authenticated, assign to variables any part of the reply that is needed for future requests using the Peek() function.

Tools

  • Postman
  • Qlik REST connector
  • Qlik Variables

Extracting Data

Time to get to the data! Using the Qlik REST connector, inspect the query response, and insert the script. This will work as the script outline.

Challenges to data extraction using Qlik REST connections:

  • Data required is in several different REST endpoints
  • Some items require unique REST requests passing item ID to get all associated data
  • Requests require pagination
  • Requests require throttling due to server REST API request limits
  • There is a required field that does not appear in the Qlik-generated script

We will break down each of these in the section below.

REST Connection Changes

When making a REST connection, I first thought that I would have to make a different Qlik connector for each endpoint I needed to hit. It was to my great relief that Qlik provides a command to change the connection on the fly with the following command:

Tool

  • “WITH CONNECTION”

Using this command, everything from Headers & Body to the URL of the request can be changed. When I use this tool in development, I will often configure the Qlik REST connection directly for the new endpoint I need to hit to generate the load script. Once I get the system-generated load script, I specify my connection endpoint using WITH CONNECTION and move on to the next table. This solution responds to the first 2 bullet points above.

Example

Let’s apply this to a real-world scenario. I have a website that I am using a REST connection with. I authenticate my connection using the default connection string in my connector (pointing to the authentication URL) and it returns my token. I then use Peek() to assign the token to my variable

Let vSiteToken = Peek('siteToken',0,'myAuthTable');
Drop Table myAuthTable;

Now I need to hit the endpoint for my data table. Edit the connection we are using and change the URL to the desired endpoint. I will often use a trace to print out the token on the screen so I can copy & drop it in as a header at this point. Then test the connection, close, and select data. Insert the script below the authentication request. Now we can adjust the very end of the request as follows:

FROM JSON (wrap on) "root" PK "__KEY_root"
WITH CONNECTION (
Url "https://myWebsite.com/rest/NewURL" 
HTTPHEADER "tokenHeaderName" "$(vSiteToken)"
);

Now we can use the same connection to hit multiple endpoints. This can be used using a “For Loop” to loop through a list of IDs and concatenating the results together on the same table.

Pagination

While Pagination is supported automatically through the Qlik REST connector setting, it is sometimes necessary to control this manually. This is especially necessary if the requests need to be throttled.

What is throttling? Some REST API connections have a limit of requests that can be made per minute. This is to ensure that one user cannot take more than a reasonable amount of server resources to the detriment of other users.

Tools

  • Response headers
  • Sleep command
  • For loop
  • Response sorting
Example

Let’s say we have a site that returns the data of the REST request in pagination. So we configure the pagination on the connector and execute. After about 15 seconds an error shows up that the server refused the request. We try re-connecting but find that the server is refusing all requests now. Thankfully we have read the documentation though, and recall that there was something about making too many requests and a time-out period if that request limit is breached. So we consult the documentation again and find that the limit is 150 requests/minute. We need to throttle.

While we are waiting for our time-out period to expire, we start setting up variables.

Set vThrottleThreshold = 150; //amount of requests per minute
Let vThrottleTime = 60000 / vThrottleThreshold; //1000*60 milliseconds per minute. Sleep works in milliseconds. I sometimes subtract about ~30 milliseconds as well since the request takes time. This can be set after some trial & error.

We will also need 2 more variables, 1 for page number, and another for the number of pages. If a request has pagination, often the first request will return with headers outlining what page number it is and how many pages there are. These headers need to be set to those variables for the parameters of the “For Loop”.

Now let’s go over how this will work: We will set our throttle limit, make our first request outside the for loop, set the variables with page info, then jump into our for loop. Once inside, the first action performed is the sleep for our required time, then we will make our request and set the parameter for page 2, and so on. It is important to note here that sending a parameter for sorting may be necessary so that each request returns new data.

Here is what the code may look like:

Set vThrottleThreshold = 150; //amount of requests per minute
Let vThrottleTime = (60000 / vThrottleThreshold) - 30;

myRESTTable:
SQL SELECT ...
FROM JSON (wrap on) ...
WITH CONNECTION(URL "https://mySite.com/rest/item?orderBy=Name");

Let vPageNum = Peek(...);
Let vPageTotal = Peek(...);

For vPageNum to vPageTotal //may need to put variables in $()

Sleep $(vThrottleTime);

Concatentate (myRESTTable)
SQL SELECT ...
FROM JSON (wrap on) ...
WITH CONNECTION(URL "https://mySite.com/rest/item?orderBy=Name&PageNo=$(vPageNum)");

Next vPageNum
//autogenerated script follows here

You will also need to add the page number to keys in the extraction. This will keep keys unique and avoid join errors down the road.

Field Not Found

I ran into this when working with a REST connector for JIRA. Sometimes the fields returned are not intuitively named or the JSON objects never have that field defined in the sample set that is first returned and generates the script. This will involve troubleshooting.

Some Tools

  • Targeted REST extraction
  • Browser object inspector found in browser Developer tools
  • Site REST documentation

These two tools approach the problem differently. The first thing to do is get an example of the sought-for property from the end-user. Consult the REST documentation and see if there an endpoint to extract that item all by itself. Often performing that and inspecting the table in Qlik Data Model Viewer will give us the answer we need.

However, there are times when the response has hundreds of fields and it is like looking for a needle in a haystack. Here I have often changed my approach to looking at the ticket in the web-browser and inspecting the desired field. This sometimes has the field name or some identifier that we can then find in the data model.

Once the field is located, adjust the automatically generated Qlik script to extract the field.

I would like to hear of a case where either of these do not work. I am sure there are examples out there, but I have not seen the example yet.

I hope that you have found these tips useful. REST connections are sometimes challenging to work with, but that is what makes them so much fun. There is nothing more satisfying than getting through a complex REST authentication process and arriving at the data. Good luck in your endeavors, and may you be met with success every step of the way. Happy Qlik-ing.

Posted in Analytics, BI Dashboards, Blog

Leave a Reply

Your email address will not be published. Required fields are marked *