[Dev Tip] Stack Overflow Architecture Update (2011)

A lot has happened since my first article on the Stack Overflow Architecture. Contrary to the theme of that last article, which lavished attention on Stack Overflow’s dedication to a scale-up strategy, Stack Overflow has both grown up and out in the last few years.

Stack Overflow has grown up by more then doubling in size to over 16 million users and multiplying its number of page views nearly 6 times to 95 million page views a month.

Stack Overflow has grown out by expanding into the Stack Exchange Network, which includes Stack Overflow, Server Fault, and Super User for a grand total of 43 different sites. That’s a lot of fruitful multiplying going on.

What hasn’t changed is Stack Overflow’s openness about what they are doing. And that’s what prompted this update. A recent series of posts talks a lot about how they’ve been handling their growth: Stack Exchange’s Architecture in Bullet PointsStack Overflow’s New York Data CenterDesigning For Scalability of Management and Fault ToleranceStack Overflow Search — Now 81% LessStack Overflow Network ConfigurationDoes StackOverflow use caching and if so, how?Which tools and technologies build the Stack Exchange Network?.

Some of the more obvious differences across time are:

  • Just More. More users, more page views, more datacenters, more sites, more developers, more operating systems, more databases, more machines. Just a lot more of more.
  • Linux. Stack Overflow was known for their Windows stack, now they are using a lot more Linux machines for HAProxy, Redis, Bacula, Nagios, logs, and routers. All support functions seem to be handled by Linux, which has required the development ofparallel release processes.
  • Fault Tolerance. Stack Overflow is now being served by two different switches on two different internet connections, they’ve added redundant machines, and some functions have moved to a second datacenter.
  • NoSQL. Redis is now used as a caching layer for the entire network. There wasn’t a separate caching tier before so this a big change, as is using a NoSQL database on Linux.

Unfortunately, I couldn’t find any coverage on some of the open questions I had last time, like how they were going to deal with multi-tenancy across so many diffrent properties, but there’s still plenty to learn from. Here’s a roll up a few different sources:

The Stats

  • 95 Million Page Views a Month
  • 800 HTTP requests a second
  • 180 DNS requests a second
  • 55 Megabits per second
  • 16 Million Users  – Traffic to Stack Overflow grew 131% in 2010, to 16.6 million global monthly uniques.

Data Centers

  • 1 Rack with Peak Internet in OR (Hosts our chat and Data Explorer)
  • 2 Racks with Peer 1 in NY (Hosts the rest of the Stack Exchange Network)

Hardware

  • 10 Dell R610 IIS web servers (3 dedicated to Stack Overflow):
    • 1x Intel Xeon Processor E5640 @ 2.66 GHz Quad Core with 8 threads
    • 16 GB RAM
    • Windows Server 2008 R2
  • 2 Dell R710 database servers:
    • 2x Intel Xeon Processor X5680 @ 3.33 GHz
    • 64 GB RAM
    • 8 spindles
    • SQL Server 2008 R2
  • 2 Dell R610 HAProxy servers:
    • 1x Intel Xeon Processor E5640 @ 2.66 GHz
    • 4 GB RAM
    • Ubuntu Server
  • 2 Dell R610 Redis servers:
    • 2x Intel Xeon Processor E5640 @ 2.66 GHz
    • 16 GB RAM
    • CentOS
  • 1 Dell R610 Linux backup server running Bacula:
    • 1x Intel Xeon Processor E5640 @ 2.66 GHz
    • 32 GB RAM
  • 1 Dell R610 Linux management server for Nagios and logs:
    • 1x Intel Xeon Processor E5640 @ 2.66 GHz
    • 32 GB RAM
  • 2 Dell R610 VMWare ESXi domain controllers:
    • 1x Intel Xeon Processor E5640 @ 2.66 GHz
    • 16 GB RAM
  • 2 Linux routers
  • 5 Dell Power Connect switches

Dev Tools

  • C#: Language
  • Visual Studio 2010 Team Suite: IDE
  • Microsoft ASP.NET (version 4.0): Framework
  • ASP.NET MVC 3: Web Framework
  • Razor: View Engine
  • jQuery 1.4.2: Browser Framework:
  • LINQ to SQL, some raw SQL: Data Access Layer
  • Mercurial and Kiln: Source Control
  • Beyond Compare 3: Compare Tool

Software And Technologies Used

  • Stack Overflow uses a WISC stack via BizSpark
  • Windows Server 2008 R2 x64: Operating System
  • SQL Server 2008 R2 running Microsoft Windows Server 2008 Enterprise Edition x64: Database
  • Ubuntu Server
  • CentOS
  • IIS 7.0: Web Server
  • HAProxy: for load balancing
  • Redis: used as the distributed caching layer.
  • CruiseControl.NET: for builds and automated deployment
  • Lucene.NET:  for search
  • Bacula: for backups
  • Nagios: (with n2rrd and drraw plugins) for monitoring
  • Splunk: for logs
  • SQL Monitor: from Red Gate – for SQL Server monitoring
  • Bind: for DNS
  • Rovio:  a little robot (a real robot) allowing remote developers to visit the office “virtually.”
  • Pingdom:  an external monitor and alert service.

External Bits

Code that is not included as part of the development tools:

  • reCAPTCHA
  • DotNetOpenId
  • WMD – Now developed as open source. See github network graph
  • Prettify
  • Google Analytics
  • Cruise Control .NET
  • HAProxy
  • Cacti
  • MarkdownSharp
  • Flot
  • Nginx
  • Kiln
  • CDN: none, all static content is served off the sstatic.net, which is a fast, cookieless domain intended for static content delivered to the Stack Exchange family of websites.

Developers And System Administrators

  • 14 Developers
  • 2 System Administrators

Content

  • License: Creative Commons Attribution-Share Alike 2.5 Generic
  • Standards: OpenSearch, Atom
  • Host: PEAK Internet

More Architecture And Lessons Learned

  • HAProxy is used instead of Windows NLB because HAProxy is cheap, easy, free, works great as a 512MB VM “device” on a network via Hyper-V. It also works in front of the boxes so it’s completely transparent to them, and easier to troubleshoot as a different networking layer instead of being intermixed with all your windows configuration.
  • A CDN is not used because even “cheap” CDNs like Amazon one are very expensive relative to the bandwidth they get bundled into their existing host’s plan. The least they could pay is $1k/month based on Amazon’s CDN rates and their bandwidth usage.
  • Backup is to disk for fast retrieval and to tape for historical archiving.
  • Full Text Search in SQL Server is very badly integrated, buggy, deeply incompetent, so they went to Lucene.
  • Mostly interested in peak HTTP request figures as this is what they need to make sure they can handle.
  • All properties now run on the same Stack Exchange platform. That means Stack Overflow, Super User, Server Fault, Meta, WebApps, and Meta Web Apps are all running on the same software.
  • There are separate StackExchange sites because people have different sets of expertise that shouldn’t cross over to different topic sites. You can be the greatest chef in the world, but that doesn’t qualify you for fixing a server.
  • They aggressively cache everything.
  • All pages accessed by (and subsequently served to) annonymous users are cached viaOutput Caching.
  • Each site has 3 distinct caches: local, site, global.
  • local cache: can only be accessed from 1 server/site pair
    • To limit network latency they use a local “L1” cache, basically HttpRuntime.Cache, of recently set/read values on a server. This would reduce the cache lookup overhead to 0 bytes on the network.
    • Contains things like user sessions, and pending view count updates.
    • This resides purely in memory, no network or DB access.
  • site cache:  can be accessed by any instance (on any server) of a single site
    • Most cached values go here, things like hot question id lists and user acceptance rates are good examples
    • This resides in Redis (in a distinct DB, purely for easier debugging)
    • Redis is so fast that the slowest part of a cache lookup is the time spent reading and writing bytes to the network.
    • Values are compressed before sending them to Redis. They have plenty of CPU and most of their data are strings so they get a great compression ratio.
    • The CPU usage on their Redis machines is 0%.
  • global cache: which is shared amongst all sites and servers
    • Inboxes, API usage quotas, and a few other truly global things live here
    • This resides in Redis (in DB 0, likewise for easier debugging)
  • Most items in the cache expire after a timeout period (a few minutes usually) and are never explicitly removed. When a specific cache invalidation is required they use Redis messaging to publish removal notices to the “L1” caches.
  • Joel Spolsky is not a Microsoft Loyalist, he doesn’t make the technical decisions for Stack Overflow, and considers Microsoft licensing a rounding error. Consider yourself corrected Hacker News commentor.
  • For their IO system they selected a RAID 10 array of Intel X25 solid state drives . The RAID array eased any concerns about reliability and the SSD drives performed really well in comparision to FusionIO at a much cheaper price.
  • The full boat cost for their Microsoft licenses would be approximately $242K. Since Stack Overflow is using Bizspark they are not paying near the full sticker price, but that’s the max they could pay.
  • Intel NICs are replacing Broadcom NICs and their primary production servers. This solved problems they were having with  connectivity loss, packet loss, and corrupted arp tables.

Related Articles

[Dev Tip] Stack Overflow Architecture

Startup – ASP.NET MVC, Cloud Scale & Deployment shows an interesting alternative approach for a Windows stack using ServerPath/GoGrid for a dedicated database machine, elastic VMs for the front end, and a free load balancer.

Stack Overflow is a much loved programmer question and answer site written by two guys nobody has ever heard of before. Well, not exactly. The site was created by top programmer and blog stars Jeff Atwood and Joel Spolsky. In that sense Stack Overflow is like a celebrity owned restaurant, only it should be around for a while. Joel estimates 1/3 of all the programmers in the world have used the site so they must be serving up something good.

I fell in deep like with Stack Overflow for purely selfish reasons, it helped me solve a few difficult problems that were jabbing my eyes out with pain. I also appreciate their no-apologies anthropologically based design philosophy. Use design to engineer in the behaviours you want to encourage and minimize the responses you want to discourage. It’s the conscious awareness of the mechanisms that creates such a satisfying synergy.

What is key about the Stack Overflow story for me is the strong case they make for scale up as a viable solution for a certain potentially large class of problems. The publicity these days is all going scale out using NoSQL databases.

If you need to Google scale then you really have no choice but to go the NoSQL direction. But Stack Overflow is not Google and neither are most sites. When thinking about your design options keep Stack Overflow in mind. In this era of multi-core, large RAM machines and advances in parallel programming techniques, scale up is still a viable strategy and shouldn’t be tossed aside just because it’s not cool anymore. Maybe someday we’ll have the best of both worlds, but for now there’s a big painful choice to be made and that choice decides your fate.

Joel boasts that for 1/10 the hardware they have performance comparable to similarly size sites. He wonders if these other sites have good programmers. Let’s see how they did it and you be the judge.

Site: http://stackoverflow.com

The Stats

  • 16 million page views a month
  • 3 million unique visitors a month (Facebook reaches 77 million unique visitors a month)
  • 6 million visits a month
  • 86% of traffic comes from Google
  • 9 million active programmers in the world and 30% have used Stack Overflow.
  • Cheaper licensing was attained through Microsoft’s BizSpark program. My impression is they pay about $11K for OS and SQL licensing.
  • Monitization strategy: unobtrusive adds, job placement ads, DevDays conferences, extend the software to target other related niches (Server Fault, Super User), developStackExchange as a white label and self hosted version of Stack Overflow, and perhaps develop some sort of programmer rating system.

    Platform

  • Microsoft ASP.NET MVC
  • SQL Server 2008
  • C#
  • Visual Studio 2008 Team Suite
  • JQuery
  • LINQ to SQL
  • Subversion
  • Beyond Compare 3
  • VisualSVN 1.5
  • Web Tier
    – 2 x Lenovo ThinkServer RS110 1U
    – 4 cores, 2.83 Ghz, 12 MB L2 cache
    – 500 GB datacenter hard drives, mirrored
    – 8 GB RAM
    – 500 GB RAID 1 mirror array
  • Database Tier
    – 1 x Lenovo ThinkServer RD120 2U
    – 8 cores, 2.5 Ghz, 24 MB L2 cache
    – 48 GB RAM
  • A fourth server was added to run superuser.com. All together the servers also run Stack Overflow, Server Fault, and Super User.
  • QNAP TS-409U NAS for backups. Decided not to use a cloud solution because the bandwidth costs of transferring 5 GB of data per day becomes prohibitive.
  • Hosting at http://www.peakinternet.com/. Impressed with their detailed technical responses and reasonable hosting rates.
  • SQL Server’s full text search is used extensively for the site search and detecting if a question has already been asked. Lucene.net is considered an attractive alternative.

    Lessons Learned

    This is a mix of lessons taken from Jeff and Joel and comments from their posts.

  • If you’re comfortable managing servers then buy them. The two biggest problems with renting costs were: 1) the insane cost of memory and disk upgrades 2) the fact that they [hosting providers] really couldn’t manage anything.
  • Make larger one time up front investments to avoid recurring monthly costs which are more expensive in the long term.
  • Update all network drivers. Performance went from 2x slower to 2x faster.
  • Upgrading to 48GB RAM required upgrading MS Enterprise edition.
  • Memory is incredibly cheap. Max it out for almost free performance. At Dell, for example, upgrading from 4G memory to 128G is $4378.
  • Stack Overflow copied a key part of the Wikipedia database design. This turned out to be a mistake which will need massive and painful database refactoring to fix. The refactorings will be to avoid excessive joins in a lot of key queries. This is the key lesson from giant multi-terabyte table schemas (like Google’s BigTable) which are completely join-free. This is significant because Stack Overflow’s database is almost completely in RAM and the joins still exact too high a cost.
  • CPU speed is surprisingly important to the database server. Going from 1.86 GHz, to 2.5 GHz, to 3.5 GHz CPUs causes an almost linear improvement in typical query times. The exception is queries which don’t fit in memory.
  • When renting hardware nobody pays list price for RAM upgrades unless you are on a month-to-month contract.
  • The bottleneck is the database 90% of the time.
  • At low server volume, the key cost driver is not rackspace, power, bandwidth, servers, or software; it is NETWORKING EQUIPMENT. You need a gigabit network between your DB and Web tiers. Between the cloud and your web server, you need firewall, routing, and VPN devices. The moment you add a second web server, you also need a load balancing appliance. The upfront cost of these devices can easily be 2x the cost of a handful of servers.
  • EC2 is for scaling horizontally, that is you can split up your work across many machines (a good idea if you want to be able to scale). It makes even more sense if you need to be able to scale on demand (add and remove machines as load increases / decreases).
  • Scaling out is only frictionless when you use open source software. Otherwise scaling up means paying less for licenses and a lot more for hardware, while scaling out means paying less for the hardware, and a whole lot more for licenses.
  • RAID-10 is awesome in a heavy read/write database workload.
  • Separate application and database duties so each can scale independently of the other. Databases scale up and the applications scale out.
  • Applications should keep state in the database so they scale horizontally by adding more servers.
  • The problem with a scale up strategy is a lack of redundancy. A cluster ads more reliability, but is very expensive when the individual machines are expensive.
  • Few applications can scale linearly with the number of processors. Locks will be taken which serializes processing and ends up reducing the effectiveness of your Big Iron.
  • With larger form factors like 7U power and cooling become critical issues. Using something between 1U and 7U might be easier to make work in your data center.
  • As you add more and more database servers the SQL Server license costs can be outrageous. So by starting scale up and gradually going scale out with non-open source software you can be in a world of financial hurt.

    It’s true there’s not much about their architecture here. We know about their machines, their tool chain, and that they use a two-tier architecture where they access the database directly from the web server code. We don’t know how they implement tags, etc. If interested you’ll be able to glean some of this information from an explanation of their schema.

    Discussion

    As an architecture profile candidate Stack Overflow has earned two important HighScalability badges: the Microsoft Stack Badge and and the Scale Up Badge. Both controversial and interesting topics of discussion.

    Microsoft Stack Badge

    The Microsoft Stack Badge was earned because Stack Overflow uses the entire Microsoft Stack: OS, database, C#, Visual Studio, and ASP .NET. People are always interested in how MS compares to LAMP, but I don’t have many case studies to show them.

    Markus Frind of Plenty of Fish fame is often used as a Microsoft stack poster child, but since he explicitly uses as little of the stack as possible he’s not really a good example. Stack Overflow on the other hand is brash in proclaiming their love for MS, even when that love is occasionally spurned.

    It’s hard to separate out the Microsoft stack and the scale up approach because for licensing reasons they tend to go together. If you find yourself in the position of transitioning from scale up to scale out by adding dozens of cores, MS licensing will bite you.

    Licensing aside I personally find C#, Visual Studio, and .Net a very productive environment. C#/.Net is at least as good as Java/JVM. ASP .NET has always been a confusing mess to me. The knock against SQL Server is you have to pay for it and if that doesn’t bother you then it’s a solid choice. The Windows OS may not be as solid as other alternatives but it works well enough.

    So for a scale up solution a Microsoft stack works, especially if you are already Windows centric.

    Scale Up Badge

    This won’t be a reenactment of the scale out vs scale up vs rent vs buy wars. For a thorough discussion of these issues please take a look at Scaling Up vs. Scaling Out andServer Hosting — Rent vs. Buy?. If you aren’t confused and if your head doesn’t hurt after reading all that then you haven’t properly understood the material 🙂

    The Scale Up Badge was awarded because Stack Overflow uses a scale up strategy to meet their scaling requirements. When they reach a limit they scale vertically by buying a bigger machine and adding more memory.

    Stack Overflow is in the sweet spot for scale up. It’s not too large, but with an Alexa ranking of 1,666 and 16 million page views a month it’s still a substantial site. Not Google scale, and probably will never have to be, but those are numbers many sites would be thrilled to have. Yet they aren’t uploading large amounts of media. They aren’t dealing with billions of tweets across complex social networks with millions of users. Their number of users is self limiting. And there are still directions they can take if they need to scale (caching, more web servers, faster disks, more denormalization, more memory, some partitioning, etc). All-in-all it’s a well done and very useful two-tier CRUD application.

    NoSQL Is Hard

    So should Stack Overflow have scaled out instead of up, just in case?

    What some don’t realize is NoSQL is hard. Relational databases have many many faults, but they make a lot of common tasks simple while hiding both the cost and complexity. If you want to know how many black Prius cars are in inventory, for example, then that’s pretty easy to do.

    Not so with most NoSQL databases (I’ll speak generally here, some NoSQL databases have more features than others). You would have program a counter of black Prius cars yourself, up front, in code. There are no aggregate operators. You must maintain secondary indexes. There’s no searching. There are no distributed queries across partitions. There’s no Group By or Order By. There are no cursors for easy paging through result sets. Returning even 100 large records at time may timeout. There may be quotas that are very restrictive because they must limit the amount of IO for any one operation. Query languages may lack expressive power.

    The biggest problem of all is that transactions can not span arbitrary boundaries. There are no ACID guarantees beyond a single record or small entity group. Once you wrap your head around what this means for the programmer it’s not a pleasant prospect at all. References must be manually maintained. Relationships must be manually maintained. There are no cascading deletes that act correctly during a failure. Every copy of denormalized data must be manually tracked and updated taking into account the possibility of partial failures and externally visible inconsistency.

    All this functionality must be written manually by you in your code. While flexibility to write your own code is great in an OLAP/map-reduce situation, declarative approaches still cover a lot of ground and make for much less brittle code.

    What you gain is the ability to write huge quantities of data. What you lose is complacency. The programmer must be very aware at all times that they are dealing with a system where it costs a lot to perform distribute operations and failure can occur at anytime.

    All this may be the price of building a truly scalable and distributed system, but is this really the price you want to pay?

    The Multitenancy Problem

    With StackExchange Stack Overflow has gone into the multi-tenancy business. They are offering StackExchange either self-hosted or as a hosted white label application.

    It will be interesting to see if their architecture can scale to handle a large number of sites.Salesorce is the king of multitenancy and although it’s true they use Oracle as their database, they basically use very little of Oracle and have written their own table structure, indexing and query processor on top of Oracle. All in order to support multitenancy.

    Salesforce went extreme because supporting a lot of different customers is way more difficult than it seems, especially once you allow customization and support versioning.

    Clearly all customers can’t run in one server for security, customization, and scaling reasons.

    You may think just create a database for each customer, share a server for a certain number of customers, and then add more servers as needed. As long as a customer doesn’t need more than one server you are golden.

    This doesn’t seem to work well in practice. Oddly database managers aren’t optimized for adding or updating databases. Creating databases is a heavyweight operation and can degrade performance for existing customers as system locks are taken. Upgrade issues are also problematic. Adding columns locks tables which causes problems in high traffic situations. Adding new indexes can also take a very long time and degrade performance. Plus each customer will likely have specializations that makes upgrading even more complicated.

    To get around these problems Salesforce’s Craig Weissman, Chief Architect, created an innovative approach where tables are not created for each customer. All data from all customers is mapped into the same data table, including indexes. The schema for that table looks something like orgid, oid, value0, value1…value500. “orgid” is the organization ID and is how data is never mixed up. It’s a very wide and sparse table, which Oracle seems to handle well. Hundreds and hundreds of “tables” and custom fields are mapped into the data table.

    With this approach Salesforce has no option other than to build their own infrastructure to interpret what’s in that table. Oracle is left to handle transactions, concurrency, and deadlock detection. The advatange is because there’s an interpreted layer handling versions and upgrades is relatively simple because the handling logic can be baked in. Strange but true.

    Related Articles

    This list includes a number of posts by Jeff as he chronicles their journey with Stack Overflow. Jeff is wonderful about being open about what they are doing and why. The comment threads are often tremendous. There’s a lot to learn.

  • Learning from StackOverflow.com by Joel Spolsky
  • Scaling Up vs. Scaling Out: Hidden Costs by Jeff Atwood
  • What Was Stack Overflow Built With?
  • New Stack Overflow Server Glamour Shots
  • New Stack Overflow Servers Ready
  • Server Hosting — Rent vs. Buy? – this is a very informative discussion the pros and cons of renting vs buying.
  • Rent vs. Buy (or EC2 vs. building your own iron) by Michael Friis
  • Oh, You Wanted “Awesome” Edition – We recently upgraded our database server to 48 GB of memory — because hardware is cheap, and programmers are expensive.
  • Our Backup Strategy – Inexpensive NAS
  • The Economics of Bandwidth
  • Understanding the StackOverflow Database Schema by Brent Ozar
  • Server Speed Tests – new hardware 2x slower – it was the network.
  • ASP.NET MVC: A New Framework for Building Web Applications
  • Three key things to know about moving MySQL into the cloud by morgan
  • NoSQL Conference
  • Decline of the Enterprise Data Warehouse by Bradford Stephens
  • Webinar: Multitenant Magic – Under the Covers of the Force.com Data Architecture by Craig Weissman, Chief Architect, salesforce.com.

[Dev Tip] JSON Web Token in ASP.NET Web API 2 using Owin

In the previous post Decouple OWIN Authorization Server from Resource Server we saw how we can separate the Authorization Server and the Resource Server by unifying the “decryptionKey” and “validationKey” key values in machineKey node in the web.config file for the Authorization and the Resource server. So once the user request an access token from the Authorization server, the Authorization server will use this unified key to encrypt the access token, and at the other end when the token is sent to the Resource server, it will use the same key to decrypt this access token and extract the authentication ticket from it.

The source code for this tutorial is available on GitHub.

This way works well if you have control on your Resource servers (Audience) which will rely on your Authorization server (Token Issuer) to obtain access tokens from, in other words you are fully trusting those Resource servers so you are sharing with them the same “decryptionKey” and “validationKey” values.

But in some situations you might have big number of Resource servers rely on your Authorization server, so sharing the same “decryptionKey” and “validationKey” keys with all those parties become inefficient process as well insecure, you are using the same keys for multiple Resource servers, so if a key is compromised all the other Resource servers will be affected.

To overcome this issue we need to configure the Authorization server to issue access tokens using JSON Web Tokens format (JWT) instead of the default access token format, as well on the Resource server side we need to configure it to consume this new JWT access tokens, as well you will see through out this post that there is no need to unify the “decryptionKey” and “validationKey” key values anymore if we used JWT.

Featured Image

What is JSON Web Token (JWT)?

JSON Web Token is a security token which acts as a container for claims about the user, it can be transmitted easily between the Authorization server (Token Issuer), and the Resource server (Audience), the claims in JWT are encoded using JSON which make it easier to use especially in applications built using JavaScript.

JSON Web Tokens can be signed following the JSON Web Signature (JWS) specifications, as well it can be encrypted following the JSON Web Encryption (JWE) specifications, in our case we will not transmit any sensitive data in the JWT payload, so we’ll only sign this JWT to protect it from tampering during the transmission between parties.

JSON Web Token (JWT) Format

Basically the JWT is a string which consists of three parts separated by a dot (.) The JWT parts are: <header>.<payload>.<signature>.

The header part is JSON object which contains 2 nodes always and looks as the following:  { “typ”: “JWT”,“alg”: “HS256” } The “type” node has always “JWT” value, and the node “alg” contains the algorithm used to sign the token, in our case we’ll use “HMAC-SHA256″ for signing.

The payload part is JSON object as well which contains all the claims inside this token, check the example shown in the snippet below:

All those claims are not mandatory  in order to build JWT, you can read more about JWT claims here. In our case we’ll always use those set of claims in the JWT we are going to issue, those claims represent the below:

  • The “sub” (subject) and “unique_claim” claims represent the user name this token issued for.
  • The “role” claim represents the roles for the user.
  • The “iss” (issuer) claim represents the Authorization server (Token Issuer) party.
  • The “aud” (audience) claim represents the recipients that the JWT is intended for (Relying Party – Resource Server). More on this unique string later in this post.
  • The “exp” (expiration time) claim represents the expiration time of the JWT, this claim contains UNIX time value.
  • The “nbf” (not before) claim represent the time which this JWT must not be used before, this claim contains UNIX time vale.

Lastly the signature part of the JWT is created by taking the header and payload parts, base 64 URL encode them, then concatenate them with “.”, then use the “alg” defined in the <header> part to generate the signature, in our case “HMAC-SHA256″. The resulting part of this signing process is byte array which should be base 64 URL encoded then concatenated with the <header>.<payload> to produce a complete JWT.

JSON Web Tokens support in ASP.NET Web API and Owin middleware.

There is no direct support for issuing JWT in ASP.NET Web API or ready made Owin middleware responsible for doing this, so in order to start issuing JWTs we need to implement this manually by implementing the interface “ISecureDataFormat” and implement the method “Protect”. More in this later. But for consuming the JWT in a Resource server there is ready middleware named “Microsoft.Owin.Security.Jwt” which understands validates, and and de-serialize JWT tokens with minimal number of line of codes.

So most of the heavy lifting we’ll do now will be in implementing the Authorization Server.

What we’ll build in this tutorial?

We’ll build a single Authorization server which issues JWT using ASP.NET Web API 2 on top of Owin middleware, the Authorization server is hosted on Azure (http://JwtAuthZSrv.azurewebsites.net) so you can test it out by adding new Resource servers. Then we’ll build a single Resource server (audience) which will process JWTs issued by our Authorization server only.

I’ll split this post into two sections, the first section will be for creating the Authorization server, and the second section will cover creating Resource server.

The source code for this tutorial is available on GitHub.

Section 1: Building the Authorization Server (Token Issuer)

Step 1.1: Create the Authorization Server Web API Project

Create an empty solution and name it “JsonWebTokensWebApi” then add a new ASP.NET Web application named “AuthorizationServer.Api”, the selected template for the project will be “Empty” template with no core dependencies. Notice that the authentication is set to “No Authentication”.

Step 1.2: Install the needed NuGet Packages:

Open package manger console and install the below Nuget packages:

You can refer to the previous post if you want to know what each package is responsible for. The package named “System.IdentityModel.Tokens.Jwt” is responsible for validating, parsing and generating JWT tokens. As well we’ve used the package “Thinktecture.IdentityModel.Core” which contains class named “HmacSigningCredentials”  which will be used to facilitate creating signing keys.

Step 1.3: Add Owin “Startup” Class:

Right click on your project then add a new class named “Startup”. It will contain the code below:

Here we’ve created new instance from class “OAuthAuthorizationServerOptions” and set its option as the below:

  • The path for generating JWT will be as :”http://jwtauthzsrv.azurewebsites.net/oauth2/token”.
  • We’ve specified the expiry for token to be 30 minutes
  • We’ve specified the implementation on how to validate the client and Resource owner user credentials in a custom class named “CustomOAuthProvider”.
  • We’ve specified the implementation on how to generate the access token using JWT formats, this custom class named “CustomJwtFormat” will be responsible for generating JWT instead of default access token using DPAPI, note that both are using bearer scheme.

We’ll come to the implementation of both class later in the next steps.

Step 1.4: Resource Server (Audience) Registration:

Now we need to configure our Authorization server to allow registering Resource server(s) (Audience), this step is very important because we need a way to identify which Resource server (Audience) is requesting the JWT token, so the Authorization server will be able to issue JWT token for this audience only.

The minimal needed information to register a Recourse server into an Authorization server are a unique Client Id, and shared symmetric key. For the sake of keeping this tutorial simple I’m persisting those information into a volatile dictionary so the values for those Audiences will be removed from the memory if IIS reset toke place, for production scenario you need to store those values permanently on a database.

Now add new folder named “Entities” then add new class named “Audience” inside this class paste the code below:

Then add new folder named “Models” the add new class named “AudienceModel” inside this class paste the code below:

Now add new class named “AudiencesStore” and paste the code below:

Basically this class acts like a repository for the Resource servers (Audiences), basically it is responsible for two things, adding new audience and finding exiting one.

Now if you take look on method “AddAudience” you will notice that we’ve implemented the following:

  • Generating random string of 32 characters as an identifier for the audience (client id).
  • Generating 256 bit random key using the “RNGCryptoServiceProvider” class then base 64 URL encode it, this key will be shared between the Authorization server and the Resource server only.
  • Add the newly generated audience to the in-memory “AudiencesList”.
  • The “FindAudience” method is responsible for finding an audience based on the client id.
  • The constructor of the class contains fixed audience for the demo purpose.

Lastly we need to add an end point in our Authorization server which allow registering new Resource servers (Audiences), so add new folder named “Controllers” then add new class named “AudienceController” and paste the code below:

This end point can be accessed by issuing HTTP POST request to the URI http://jwtauthzsrv.azurewebsites.net/api/audience as the image below, notice that the Authorization server is responsible for generating the client id and the shared symmetric key. This symmetric key should not be shared with any party except the Resource server (Audience) requested it.

Note: In real world scenario the Resource server (Audience) registration process won’t be this trivial, you might go through workflow approval. Sharing the key will take place using a secure admin portal, as well you might need to provide the audience with the ability to regenerate the key in case it get compromised.

Register Audience

Step 1.5: Implement the “CustomOAuthProvider” Class

Now we need to implement the code responsible for issuing JSON Web Token when the requester issue HTTP POST request to the URI: http://jwtauthzsrv.azurewebsites.net/oauth2/token the request will look as the image below, notice how we are setting the client id for for the Registered resource (audience) from the previous step using key (client_id).

Issue JWT

To implement this we need to add new folder named “Providers” then add new class named “CustomOAuthProvider”, paste the code snippet below:

As you notice this class inherits from class “OAuthAuthorizationServerProvider”, we’ve overridden two methods “ValidateClientAuthentication” and “GrantResourceOwnerCredentials”

  • The first method “ValidateClientAuthentication” will be responsible for validating if the Resource server (audience) is already registered in our Authorization server by reading the client_id value from the request, notice that the request will contain only the client_id without the shared symmetric key. If we take the happy scenario and the audience is registered we’ll mark the context as a valid context which means that audience check has passed and the code flow can proceed to the next step which is validating that resource owner credentials (user who is requesting the token).
  • The second method “GrantResourceOwnerCredentials” will be responsible for validating the resource owner (user) credentials, for the sake of keeping this tutorial simple I’m considering that each identical username and password combination are valid, in read world scenario you will do your database checks against membership system such as ASP.NET Identity, you can check this in theprevious post.
  • Notice that we are setting the authentication type for those claims to “JWT”, as well we are passing the the audience client id as a property of the “AuthenticationProperties”, we’ll use the audience client id in the next step.
  • Now the JWT access token will be generated when we call “context.Validated(ticket), but we still need to implement the class “CustomJwtFormat” we talked about in step 1.3.

Step 1.5: Implement the “CustomJwtFormat” Class

This class will be responsible for generating the JWT access token, so what we need to do is to add new folder named “Formats” then add new class named “CustomJwtFormat” and paste the code below:

What we’ve implemented in this class is the following:

  • The class “CustomJwtFormat” implements the interface “ISecureDataFormat<AuthenticationTicket>”, the JWT generation will take place inside method “Protect”.
  • The constructor of this class accepts the “Issuer” of this JWT which will be our Authorization server, this can be string or URI, in our case we’ll fix it to URI with the value “http://jwtauthzsrv.azurewebsites.net”
  • Inside “Protect” method we are doing the following:
    1. Reading the audience (client id) from the Authentication Ticket properties, then getting this audience from the In-memory store.
    2. Reading the Symmetric key for this audience and Base64 decode it to byte array which will be used to create a HMAC265 signing key.
    3. Preparing the raw data for the JSON Web Token which will be issued to the requester by providing the issuer, audience, user claims, issue date, expiry date, and the signing Key which will sign the JWT payload.
    4. Lastly we serialize the JSON Web Token to a string and return it to the requester.
  • By doing this requester for an access token from our Authorization server will receive a signed token which contains claims for a certain resource owner (user) and this token intended to certain Resource server (audience) as well.

So if we need to request a JWT from our Authorization server for a user named “SuperUser” that needs to access the Resource server (audience) “099153c2625149bc8ecb3e85e03f0022″, all we need to do is to issue HTTP POST to the token end point (http://jwtauthzsrv.azurewebsites.net/oauth2/token) as the image below:

Issue JWT2

The result for this will be the below JSON Web Token:

There is an online JWT debugger tool named jwt.io that allows you to paste the encoded JWT and decode it so you can interpret the claims inside it, so open the tool and paste the JWT above and you should receive response as the image below, notice that all the claims are set properly including the iss, aud, sub,role, etc…

One thing to notice here that there is red label which states that the signature is invalid, this is true because this tool doesn’t know about the shared symmetric key issued for the audience (099153c2625149bc8ecb3e85e03f0022).

So if we decided to share this symmetric key with the tool and paste the key in the secret text box; we should receive green label stating that signature is valid, and this is identical to the implementation we’ll see in the Resource server when it receives a request containing a JWT.

jwtio

Now the Authorization server (Token issuer) is able to register audiences and issue JWT tokens, so let’s move to adding a Resource server which will consume the JWT tokens.

Section 2: Building the Resource Server (Audience)

Step 2.1: Creating the Resource Server Web API Project

Add a new ASP.NET Web application named “ResourceServer.Api”, the selected template for the project will be “Empty” template with no core dependencies. Notice that the authentication is set to “No Authentication”.

Step 2.2: Installing the needed NuGet Packages:

Open package manger console and install the below Nuget packages:

The package “Microsoft.Owin.Security.Jwt” is responsible for protecting the Resource server resources using JWT, it only validate and de-serialize JWT tokens.

Step 2.3: Add Owin “Startup” Class:

Right click on your project then add a new class named “Startup”. It will contain the code below:

This is the most important step in configuring the Resource server to trust tokens issued by our Authorization server (http://jwtauthzsrv.azurewebsites.net), notice how we are providing the values for the issuer, audience (client id), and the shared symmetric secret we obtained once we registered the resource with the authorization server.

By providing those values to JwtBearerAuthentication middleware, this Resource server will be able to consume only JWT tokens issued by the trusted Authorization server and issued for this audience only.

Note: Always store keys in config files not directly in source code.

Step 2.4: Add new protected controller

Now we want to add a controller which will serve as our protected resource, this controller will return list of claims for the authorized user, those claims for sure are encoded within the JWT we’ve obtained from the Authorization Server. So add new controller named “ProtectedController” under “Controllers” folder and paste the code below:

Notice how we attribute the controller with [Authorize] attribute which will protect this resource and only will authentic HTTP GET requests containing a valid JWT access token, with valid I mean:

Conclusion

Using signed JSON Web Tokens facilitates and standardize the way of separating the Authorization server and the Resource server, no more need for unifying machineKey values nor having the risk of sharing the “decryptionKey” and “validationKey” key values among different Resource servers.

Keep in mind that the JSON Web Token we’ve created in this tutorial is signed only, so do not put any sensitive information in it :)

The source code for this tutorial is available on GitHub.

That’s it for now folks, hopefully this short walk through helped in understanding how we can configure the Authorization Server to issue JWT and how we can consume them in a Resource Server.

If you have any comment, question or enhancement please drop me a comment, I do not mind if you star theGitHub Repo too :)

REF: http://bitoftech.net/2014/10/27/json-web-token-asp-net-web-api-2-jwt-owin-authorization-server/