Menu Close

Tips for Extracting Data Using the Qlik REST Connection

The Qlik REST connector is one of the more powerful tools when it comes to connecting to any Web resource. Pretty much any web database supports REST connections. As a Qlik developer, what are the tools at our disposal to develop these types of connections? Here are the steps I take to develop a Qlik REST connection.

Documentation

Every website that supports REST connections has documentation on how to connect. It will outline authentication (which sometimes requires secret generation), headers, whether a cookie/token must be passed, etc. Before attempting to connect to a website, read up on the documentation & find what endpoints need to be hit and what must be conserved in order to establish trust.

Tools

  • Web search
  • Site Developer’s documentation

Authentication

After reading up on the documentation it is time to make the connection. Depending on the security protocols in place, connecting will be more or less difficult. Sometimes a connection can be made using the Qlik REST connector & we can jump right into extracting without further hassle. However, even after reading the documentation, the site can offer challenges to connecting. Here are some things I have seen:

  • The site requires dynamic parameters
  • The site requires dynamic headers
  • The site requires passing a session-specific cookie or Token
  • A hashing algorithm is required

Except for the last point, all of these requirements are easily handled within Qlik.

My process for establishing a connection is to first try to execute it within Qlik. When that does not work, I move over to Postman so I can test quicker & save my requests (Postman does this automatically). Postman is also easier to inspect my request & the website’s response without having to go into the viewer as I would in Qlik. It also stores cookies in a place I can easily inspect them and even copy them into Qlik if desired.

Once authenticated, assign to variables any part of the reply that is needed for future requests using the Peek() function.

Tools

  • Postman
  • Qlik REST connector
  • Qlik Variables

Extracting Data

Time to get to the data! Using the Qlik REST connector, inspect the query response, and insert the script. This will work as the script outline.

Challenges to data extraction using Qlik REST connections:

  • Data required is in several different REST endpoints
  • Some items require unique REST requests passing item ID to get all associated data
  • Requests require pagination
  • Requests require throttling due to server REST API request limits
  • There is a required field that does not appear in the Qlik-generated script

We will break down each of these in the section below.

REST Connection Changes

When making a REST connection, I first thought that I would have to make a different Qlik connector for each endpoint I needed to hit. It was to my great relief that Qlik provides a command to change the connection on the fly with the following command:

Tool

  • “WITH CONNECTION”

Using this command, everything from Headers & Body to the URL of the request can be changed. When I use this tool in development, I will often configure the Qlik REST connection directly for the new endpoint I need to hit to generate the load script. Once I get the system-generated load script, I specify my connection endpoint using WITH CONNECTION and move on to the next table. This solution responds to the first 2 bullet points above.

Example

Let’s apply this to a real-world scenario. I have a website that I am using a REST connection with. I authenticate my connection using the default connection string in my connector (pointing to the authentication URL) and it returns my token. I then use Peek() to assign the token to my variable

Let vSiteToken = Peek('siteToken',0,'myAuthTable');
Drop Table myAuthTable;

Now I need to hit the endpoint for my data table. Edit the connection we are using and change the URL to the desired endpoint. I will often use a trace to print out the token on the screen so I can copy & drop it in as a header at this point. Then test the connection, close, and select data. Insert the script below the authentication request. Now we can adjust the very end of the request as follows:

FROM JSON (wrap on) "root" PK "__KEY_root"
WITH CONNECTION (
Url "https://myWebsite.com/rest/NewURL" 
HTTPHEADER "tokenHeaderName" "$(vSiteToken)"
);

Now we can use the same connection to hit multiple endpoints. This can be used using a “For Loop” to loop through a list of IDs and concatenating the results together on the same table.

Pagination

While Pagination is supported automatically through the Qlik REST connector setting, it is sometimes necessary to control this manually. This is especially necessary if the requests need to be throttled.

What is throttling? Some REST API connections have a limit of requests that can be made per minute. This is to ensure that one user cannot take more than a reasonable amount of server resources to the detriment of other users.

Tools

  • Response headers
  • Sleep command
  • For loop
  • Response sorting
Example

Let’s say we have a site that returns the data of the REST request in pagination. So we configure the pagination on the connector and execute. After about 15 seconds an error shows up that the server refused the request. We try re-connecting but find that the server is refusing all requests now. Thankfully we have read the documentation though, and recall that there was something about making too many requests and a time-out period if that request limit is breached. So we consult the documentation again and find that the limit is 150 requests/minute. We need to throttle.

While we are waiting for our time-out period to expire, we start setting up variables.

Set vThrottleThreshold = 150; //amount of requests per minute
Let vThrottleTime = 60000 / vThrottleThreshold; //1000*60 milliseconds per minute. Sleep works in milliseconds. I sometimes subtract about ~30 milliseconds as well since the request takes time. This can be set after some trial & error.

We will also need 2 more variables, 1 for page number, and another for the number of pages. If a request has pagination, often the first request will return with headers outlining what page number it is and how many pages there are. These headers need to be set to those variables for the parameters of the “For Loop”.

Now let’s go over how this will work: We will set our throttle limit, make our first request outside the for loop, set the variables with page info, then jump into our for loop. Once inside, the first action performed is the sleep for our required time, then we will make our request and set the parameter for page 2, and so on. It is important to note here that sending a parameter for sorting may be necessary so that each request returns new data.

Here is what the code may look like:

Set vThrottleThreshold = 150; //amount of requests per minute
Let vThrottleTime = (60000 / vThrottleThreshold) - 30;

myRESTTable:
SQL SELECT ...
FROM JSON (wrap on) ...
WITH CONNECTION(URL "https://mySite.com/rest/item?orderBy=Name");

Let vPageNum = Peek(...);
Let vPageTotal = Peek(...);

For vPageNum to vPageTotal //may need to put variables in $()

Sleep $(vThrottleTime);

Concatentate (myRESTTable)
SQL SELECT ...
FROM JSON (wrap on) ...
WITH CONNECTION(URL "https://mySite.com/rest/item?orderBy=Name&PageNo=$(vPageNum)");

Next vPageNum
//autogenerated script follows here

You will also need to add the page number to keys in the extraction. This will keep keys unique and avoid join errors down the road.

Field Not Found

I ran into this when working with a REST connector for JIRA. Sometimes the fields returned are not intuitively named or the JSON objects never have that field defined in the sample set that is first returned and generates the script. This will involve troubleshooting.

Some Tools

  • Targeted REST extraction
  • Browser object inspector found in browser Developer tools
  • Site REST documentation

These two tools approach the problem differently. The first thing to do is get an example of the sought-for property from the end-user. Consult the REST documentation and see if there an endpoint to extract that item all by itself. Often performing that and inspecting the table in Qlik Data Model Viewer will give us the answer we need.

However, there are times when the response has hundreds of fields and it is like looking for a needle in a haystack. Here I have often changed my approach to looking at the ticket in the web-browser and inspecting the desired field. This sometimes has the field name or some identifier that we can then find in the data model.

Once the field is located, adjust the automatically generated Qlik script to extract the field.

I would like to hear of a case where either of these do not work. I am sure there are examples out there, but I have not seen the example yet.

I hope that you have found these tips useful. REST connections are sometimes challenging to work with, but that is what makes them so much fun. There is nothing more satisfying than getting through a complex REST authentication process and arriving at the data. Good luck in your endeavors, and may you be met with success every step of the way. Happy Qlik-ing.

Posted in Analytics, BI Dashboards, Blog

5 Comments

  1. Mihai C

    Hi Paul,

    I hope you are well.

    I was reading your article about Qlik REST connector and i’ve decided to ask you about an issue i’m facing ref. to this topic:

    I have a simple Qlikview dashboard using a REST connection to pull data from an API.

    Manual reload on Qlik Desktop works just fine but fails when using a scheduled reload configured on the Qlik Server.

    Error:

    20210125T152052.970+0000 Error: QVX_UNEXPECTED_END_OF_DATA: Failed to connect to server.
    20210125T152052.970+0000 Execution Failed
    20210125T152052.972+0000 Execution finished.

    QVrestconnector is installed on all QDS nodes and i’m testing it with same local user which is same as the service account running Qlik services.

    I did try some of suggested solutions for previous questions without any help.

    My questions to you are, have you experienced something similar and do you have a solution?

    Many Thanks,
    Mihai

    • Paul Pirolli

      Hi Mihai,

      I don’t have any experience with the QlikView REST connector, but I’ll try responding as best I can with the information you outlined above. The error that the Qlik server is throwing you is very generic; it does not look like a Web API error since there is no error code. This fails before retrieving any data, correct? I found one similar problem on the Qlik Community here and I agree that it sounds like a permission issue with your service account. In the case outlined in the link they had to give the service account Internet Explorer permission to hit an external URL. You can also test this by adding a fake JSON structure for your request to see if it throws the same error, that would definitely point that Qlik never makes it to the URL with the request.

      Hope that helps; and good luck.

  2. Sergey Makushynski

    Hi Paul,
    Thanks for wonderful article. It looks great, but as I understand Qlik Rest connector requires to specify the fields list for each request.
    Is there a way to use SQL SELECT * approach (I know it doesn’t work) or something similar to say: load all fields.

    The approach you describing suggests that it’s probably possible.

    Best regards,
    Sergey

    • Paul Pirolli

      Hi Sergey,

      Unfortunately, there is no option of using “SQL Select *” to unpack the JSON response of the REST request. If I need to hit multiple endpoints in a site to retrieve different data objects (tables), I edit the REST connector in Qlik and add my new endpoint for each request. Once I have the extraction automatically generated in the loadscript I revert the connector to its original endpoint (some websites require hitting an authentication endpoint first) and add the “WITH CONNECTION” parameter to the request.

      If you do find a way to use “SQL SELECT *” let me know – although I suspect Qlik would not unpack the object and you would end up with a JSON string value. It may be possible to only reference the top-most object in a generic way and thus get such a string, but I am not sure how to pull that off.

      Cheers,
      Paul

      • Sergey Makushynski

        Hi Paul,

        Thanks for a quick reply.
        I’ve ended up storing list of fields for each endpoint as a variable and reading variable name from the table for each endpoint. However, I had to connect to each endpoint to get the list of fields first.

        At least I was able to use For loop and make the whole script a bit more manageable.

        Best regards,
        Sergey

Leave a Reply

Your email address will not be published. Required fields are marked *