The Agonizing Workings of Custom and Managed Rules in Azure WAF
OK, this one was a doozey.
I am working on part of a "zero-trust" initiative that has generated the need for some updated rules in the WAF we have in place on our FrontDoor. We use a mix of Custom and Managed rulesets, with a default DENY at the end, much like all responsible netizens. What we learned about how Azure's WAF functions has made us rethink our decision to support Azure AT ALL.
(OK, that's hyperbole, but we certainly did weigh the extra spend of CloudFront and Akami, and even virtual F5 appliances, over the increased complexity we were looking at; and that says a lot because we don't like spending money.)
I hear you: "Dude, what could be so vile you are considering extremes like those?" It all comes down to how Azure WAF handles rule flow. First, Custom rules always run before the managed ones. And, I can dig this for the most part. Honestly, this is fine in 99/100 use cases.
Where the problem comes is actually in how Azure WAF passes traffic from rule to rule. In every other WAF/Stateful firewall in creation that uses DENY:DENY logic (hyperbole again, I know. At least I didn't use literally?) when a rule is processed and a failure to match is determined, it is passed down the line of priority until it matches a DENY condition. In Azure WAF, if the traffic hits your first custom DENY rule and fails to match, it's allowed through and NO MORE RULES ARE PROCESSED.
Yes, you read that correctly. NO MORE RULES ARE PROCESSED. That Bot control rule in managed? Nope. All the SQL injection ones? HA! Log4J? SpringShell? You see the problem. As it turns out, our WAF was doing almost nothing because the first rule was "are you in our personalized blacklist?" with action DENY. So, as long as you weren't a TA we dealt with previously, come on in I guess...
So, what are the options as we saw them?
- Use an external service/virtual device instead. This will increase cost exponentially and could induce maintenance headaches (but in reality may actually reduce overhead if you are using identical rules across products by managing one repo of rules instead of many)
- Ask your engineering team to completely rethink how firewalls work, and then reengineer your rules to suit MS's ALLOW fever dream.
We went with the rewrite. 96 engineering hours later and we have had no success beyond 2 rolled back deployments. WooooooooSHAAAaaaaaaaaaaaaa.