Why you may want to have authenticated users

There are a few features of web applications we – as users – grew accustomed to. Like bookmarking things on our favourite marketplaces for instance. For a Specialised Information Service, this feature is especially appealing, since it fits the workflow of a good number of scientists and humanists. You – as a researcher – can search a catalogue and bookmark anything that strikes your interest for further review. You can also export them into your reference system of choice.

While a bookmarking system could well be implemented via cookies or a session based key-value store, we wanted to keep them persistent, necessitating a login. With authenticated users, the bookmarks themselves can be linked to a user by way of a parental key. Additionally, having authenticated users offers the possibility to count them, providing a metric for funding bodies.

On the flip side, whenever you have authenticated users, you store personal information. At least a name and an email address, the latter often serving as a user name. This is a potential liability, especially if your project enters long term maintenance without supervision. If you have a data leak, or your site gets hacked, you may inadvertently leak personal information and due to the lack of supervision you won’t even notice it. So storing passwords and usernames is no viable option. Not per se, but users tend to reuse passwords. So if you loose password hashes, their passwords can be brute-forced with standard dictionary attacks and rainbow tables. It takes a while, but you can match a known (or guessable) password to a hash.

So if you don’t want to store passwords, what options remain? For researchers, there’s a convenient option in using the ID of the Open Research Council (ORCID). You can use ORCID as a single sign-on method for other services. They provide a three-legged (see below) OAuth API that allows you to have authenticated users on your own platform, while not having to manage their passwords. Thus you only store their names and email addresses, (if any). And both aren’t even strictly necessary for authentication purposes. Storing a name and email address can be considered a comfort features to greet users by name in their profile and send them automated messages on certain occasions.

In this article, I will (very) briefly describe a minimal setup for authenticating users via OAuth and ORCID. It’s a simple enough process that spares you – as a developer – much headache. And for the prospective users, it simplifies the login process.

The general principle of ORCID’s OAuth implementation.

As indicated earlier, ORCID provides an API for a three-legged OAuth that allows a client application access to the user data and can act as a login method. It’s “three-legged” because it’s not a single request, but the client needs to complete three steps to properly authenticate a user and access their data. The image below shows the general data flow of such a process.

An Image showing three rectangular shapes and a stylised human icon, connected with several annotated arrows. On the left a rose coloured rectangle is labeled with the word “Client”. From it an arrow runs to the stylised human icon bearing the description “Authorization Request”. From the stylised human figure another arrow runs back to the rose coloured rectangle bearing the inscription “Authorization grant”. Other arrows run from the rose coloured rectangle to two other rectangles, situated on the right, below the stylised human. One is green and has the words “Authorization Server” written on it. The other, below the green one, is blue and bears the words “Resource server”. The two arrows running from the rose coloured rectangle to the others bear the names “Authorization grant” and “Access token”, respectively. An arrow running from the green to the rose colour rectangle is annotated with “Access token”, and an arrow running from the blue to the rose colour rectangle bears the inscription “Protected resource”. All arrows are numbered from one to six.

(Fig. 1: Abstract data flow of a three legged OAuth request. Source: Wikimedia Commons, Author: Davensvd, License: CC-BY-SA, No changes were made to the original resource.)

The workflow is simple: Your client application (the rose coloured rectangle) contains a login button. When a user clicks on the button, they are redirected to – in this case – ORCID to log in with their credentials. The URL they are following contains your Client ID and a secret. After the user logs in, ORCID sends a grant token back to your client, that is then exchanged with ORCIDs authorisation server (the green rectangle). The grant token will only be sent to a specific URL that you have to register with ORCID to prevent cross site attacks, since your Client ID and secret are sent via URL and are accessible in your sites source code. After receiving the grant token you can exchange it with the authorisation server for an access token. This token is needed to access a user’s data.

ORCID sends the basic user data together with the access code, so we get the ORCID itself, a name and an email address, if the user provided any. The access token needs to be stored in case the client application needs access to the user data later. The token expires after some time (20 years in ORCID’s case), so it might need to be renewed every time a user logs in.

With the access code, we can read and – if the user agrees – modify their data in their ORCID profile. This would for instance offer the possibility to manage a user’s publication data via the client application.

Since we currently only need to authenticate users, the simple act of retrieving data acts as an authorisation, proving to us that, indeed, the correct user has logged into ORCID. So we don’t necessarily need to store the access token at all. It’s just there for further use.

Implementing ORCID in Django

As seen in the previous section, to make use of OAuth, we need to implement the handshake and data exchange to have users log in with their ORCID credentials. But just to summarise: We need a Client ID from ORCID along with a secret. We need a login button, which is handily provided by ORCID as a code snipped, and an authentication URL. This URL will receive the grant token (see fig. 1). Once we have that, we can process the token in our authentication chain and authenticate a user.

Since the we – the UBlabs team – use Django to develop our respective portals, I implemented the authentication as an extension to Django’s user model. For the sake of brevity, I will not go into too great detail on the actual implementation, but the complete code can be found here.

First, we need a view that serves the URL we specify as callback with ORCID. It’s very simple, so I will include the code below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def auth(request):
    token = request.GET.get('code')
    user = authenticate(request, token=token)
    if user is not None:
        auth_login(request, user)
        # Render to a success page.
        return render(request, 'auth.html')
    else:
        # Return an 'invalid login' error message.
        return render(request, 'auth_failed.html')

Since the grant token (i.e. the “code” parameter), will be verified during authentication by exchanging it for an access token (as described above), no checks for bogus tokens are necessary. The verification is handled by the “authenticate” method, which is part of Django’s user authentication middleware.

That’s all there is to it. Since we simply extend the existing user model, we can inject our own login method into Django’s authentication chain. We do so by writing a new authentication backend that accepts the token we get from ORCID and processes it. The algorithm for processing this token is very simple and is shown below as a flow chart.

A Flowchart illustrating a simple login algorithm. It starts at the top with a rectangle labeled “User?”. From there an arrow leads to a diamond shape with the label “Token?”. From here two arrows lead away. One, pointing to the right, reads “No”. It points to a rectangle labeled “Reject login”. The other arrow reads “Yes” and leads to another diamond shape with the label “Token valid?”. From here an arrow labeled “No” leads back to “Reject Login” and an arrow labeled “yes” points to another diamond shape with the label “User exists?”. From here, again, two arrows lead away reading “yes” and “no”. The yes-arrow leads to a rectangle reading “return user”, while the no-arrow leads to a rectangle reading “create and return user”

(Fig. 2: Flowchart of the login algorithm)

As we can see, all we really need to check is if ORCID sends a token back and whether it’s a valid token. We validate the token by exchanging it for an access token, as shown in fig. 1. We then can use the access token to pull the user data. Though in practise, we don’t need to do that, since ORCID sends all the basic user data we need with the access token. An additional benefit is that we don’t have to prompt the user to create an account. We can do that just by verifying the grant token and creating a user on-the-fly.

And this is literally all that there is to it. We now can have users log into our service, provide them with user-bound services like a persistent list of bookmarks and all without having to manage users and their data ourselves.

Benefits of using ORCID specifically

OAuth is not specific to ORCID. In fact, many other large services use it to allow you to log in with your Google or Facebook credentials. So why ORCID then? The answer is quite simple: Our service is aimed at researchers. Most researchers already have an ORCID, or will have to have one for online publications anyway. It’s an account most researchers need to be familiar with anyway. As a benefit, integrating ORCID into our service, we could, in the future, offer the opportunity to access and modify a user’s publication data, or even allow them to “claim” their publications via our catalogue.

But for now, it’s a very handy feature to avoid the hassle of account management and potential password leaks. A neat by-feature of this approach is that we can automatically purge users whose access token is expired, since we can securely claim they lost interest in our services and haven’t logged in for a long time.