Is “REST API” an Oxymoron?

Posted in SOA by AST on Saturday, December 16th, 2006

Even though I had to temporarily drop out of the ongoing discussion on the service-orientated-architecture Yahoo group/mailing list, which prompted my last post, to focus on a few high-priority interrupts for a while, my brain hasn’t fully disengaged from the discussion.

One of the light bulbs that went off in my head during the aforementioned discussion was it finally clicked as to what the uniform interface constraint “hypermedia as the engine of application state” (Fielding 2000, §5.1.5) actually means (to me, anyway). Now that I think I understand it, I also understand why it is so hard for people to really get what it means: most of the people trying to figure this out are experienced, traditional programmers. What do traditional programmers do? If they’re any good and they’re dealing with large-scale distributed systems, they spend an awful lot of time on the design of the “perfect” API for their remote components. Perfect in this context means that it is the optimal trade-off between all of the architectural constraints and system requirements to deliver an efficient distributed system.

API Brain Damage

I’m going to go out on a limb here, but I think this “API brain damage” is part of why we haven’t been able to significantly advance the state of the art of software system design over the last 30 years. We all have it, because from the moment you learn what a compiler is and get started with programming they’re all around you. As you develop your own skills and experience, you’re exposed to both good and bad API design, and this all gets mixed in together in our little brains so that each of us develops their own perspective on what is a “good” vs. a “bad” API. Most of us can, by this stage, pass such a judgment in 30 minutes or less.

However, I think the effects of this brain damage are to create some pretty fundamental assumptions about how computer systems work. If you need programmatic access (here it is again, reinforced by the terms we use to describe our requirements) to a set of functionality (for kicks, lets call it a service), then the first thing we as programmers want to know is “where’s the API?”. It’s ingrained in our very being because of years and years of positive reinforcement. We all have it…and I think it’s a tragic mistake.

Even people who have a pretty good understanding of what REST is get pulled back into the primordial slime just as they’re about to sprout legs and walk upright. Joe Gregorio’s really good article on building a RESTful system, Constructing or Traversing URIs?, is a prime example of this type of thinking. I’m not criticizing Joe here, I’m criticizing the way we, as software architects and developers, have been trained to think.

The point of Joe’s article is that hypermedia is about link traversal, but that because the possible URI space is nearly infinite, it’s reasonable to publish recipes for link construction under the argument that this is what HTML Forms using the GET method do anyway (emphasis added). This “optimization” undermines what to me is the whole point of “hypermedia as the engine of application state”: link traversal, and this is recognized by the rest of the article Joe wrote. As soon as you start down this slippery slope, you’ve lost all of the advantages I see in even bothering with hypermedia at all. It no longer becomes part of the application, it’s just data that’s shipped around via some transfer mechanism, in this case HTTP.

Hypermedia Applications

Many things have contributed to my opinion about this topic, including this statement by Eric Newcomer recently on the mailing list:

I should just note that comparing the Web to WS-* is an apples-to-oranges comparison (one being an application and the other being a collection of specifications).

With apologies to Eric, I initially sorta dismissed this statement because I was focused on my conversation with Steve Jones—but I shouldn’t have. Eric is exactly right here. The World Wide Web is a hypermedia application, not just a collection of specifications. Again, our software development training doesn’t help us here very much. Good ol’ functional decomposition makes us (well, me at least) want to see the Web as HTTP (RFC 2616), URIs (RFC 3986), MIME types (Wikipedia entry to set of related RFCs), HTML/XHTML (W3C markup specifications) and HTML forms (W3C recommendation and RFC 2388).

In actual fact, it’s all of those working together to provide, as Eric said, the Web as a distributed hypermedia application (which, if I remember correctly is a point also made by Roy Fielding himself in the dissertation). The Web works because both user agents (browsers) and the server applications use all of these specifications together to expose a set of functionality to the interactive user. There is no a priori agreement between the browser and the Web server as to how the information service built on these specifications used, but because of agreement on how these specifications are used together, as an application, it doesn’t matter if today, CNN.com is built using Microsoft ASP, and tomorrow it’s built on PHP. Apart from the application of the “Cool URIs Don’t Change” principle, if a user starts from http://www.cnn.com, they will always be able to utilize the CNN news service via their browser’s implementation of those specifications and the implicit agreement of CNN to publish its service in accordance with them too.

The moment that CNN or any other service provider publishes a recipe, guideline or specification of how to access specific parts of that service, e.g. the latest headlines may be found at http://www.cnn.com/headlines/ or http://headlines.cnn.com/ or whatever, as Joe points out in the article, they’ve made an implicit commitment to support that API (because that’s what it essentially is) for a period of time. When someone comes along an implements a specialized headline grabber that follows that API, and CNN decides to change it due to idle whim or genuine business need, that headline grabber client is now broken. If it’s just a single user, maybe this isn’t a big deal, but if it is every 3rd-party trading partner of your organization, the impact is a bit more significant.

A hypermedia application such as Atom or RSS and content negotiation via MIME types, HTTP accept headers and embedded <link/> tags mean that this sort of evolution could happen without breaking the client—provided there is agreement on the application semantics of how those things should be both used and interpreted between clients and servers. Anything else means you’re back to brittle, API based systems that can no longer evolve independently of each other.

Closing Thoughts

I don’t have all of the answers here, but I think that the notion of a “REST API” is an oxymoron because REST is about dynamic evolvability of clients and servers based on codified understandings of previously agreed application semantics. This means that HTML browsers and Web servers agree to provide the Web application in a way that both can understand and use, but it also means that RSS/Atom feed readers and server feeds agree on the way they interact to both access and provide the syndication application.

What I’m saying is that the nature and specification of the hypermedia application is as key to REST as how you use the HTTP verbs. However, since the HTTP verbs are essentially an API that programmers can get their heads around, that’s where everyone’s focus is at the moment. I think this is a diversion from where people should be thinking about REST. As long as you agree the semantics of the hypermedia application (HTML+Forms+MIME+HTTP or Atom+XHTML+MIME+HTTP), the way that application is implemented on the server should be an implementation detail and not something exposed to clients in terms of an API.

If the hypermedia constructs being used to describe the interaction between the clients and servers are not rich enough to abstract these things so the client needs to know that it’s supposed to POST data to URI x rather than being able to simply traverse hypermedia provided by the server (meaning the operation, data and location are provided to the client in a way it can understand rather than having any of this hard-coded as client implementation logic), then the hypermedia application being used (and not the information service) has not been sufficiently defined. Using the existing and emerging specifications for describing content and interaction, it should be possible to specify the application. If it isn’t, then we need to be spending our efforts on a way to do that rather than arguing about the RESTfulness of so-and-so’s latest HTTP-based API.

To me, this is the real problem to be solved in implementing RESTful systems. I think there are people who are starting to realize this need implicitly, but I think it’s time we made that need an explicit requirement of systems implemented in the REST style. If you can’t describe the interaction via hypermedia and link traversal semantics only, then I don’t think the system truly meets the requirements of REST as I understand them today. The uniform interface is hypermedia, not HTTP. Focusing on HTTP is not seeing the forest for the trees.

Comments, flames and discussion are more than welcome.

Socio-political and Commercial Motivations for WS-*

Posted in SOA by AST on Saturday, December 9th, 2006

I can appreciate Gervas’ position as a “neutral, non-technical observer” to the whole ROA/SOA thread, but I think the root of the problems in bright people having difficulty clarifying basic issues about REST is entirely one of “what they know” and “where they are coming from”.

I have tremendous respect for Steve and everyone else on the list [the service-oriented-architecture list] that I’ve interacted with, so this isn’t personal in any way. I think it is important to understand a bit of industry history in light of lots of smart people and vendors trying to figure out how to field an SOA that works.

A lot of us on this list have been doing distributed computing for a long time. Most of us have done a lot of one or more of CORBA and DCOM before RMI/EJB came on the scene and certainly before XML-RPC and SOAP came on the scene (some people have been doing it earlier than that).

The thing about a programming paradigm is that to get any good at actually doing something with it, it takes a lot of time and effort to learn how to think and design in a way that takes advantage of it. CORBA, DCOM and EJB and the like are about extending the local programming model to remote systems in a more-or-less coherent way.

All of them are object-oriented in that you create a service with a defined set of capabilities and a given interface. This interface is normally designed in similar ways to local interfaces in that it exposes a fairly rich and domain-specific API for interaction between clients and servers. Most of the early mistakes people make in developing CORBA, DCOM and EJB projects are in the granularity of those interfaces because they forget or don’t consider the effect of the cost and overhead of communicating over the network vs the costs within the same address space, e.g. “normal” objects.

Learning how to optimize the tradeoff between a rich, domain-specific interface and one that is efficient is one of the key things in learning how to design and develop successful distributed object systems.

If you take a look at the history involved in developing these systems, formalization of CORBA started in 1990 at the OMG, DCOM surfaced around 1993 and RMI and EJB emerged in 1997. Getting all of these technologies implemented took a lot of work because most of them are naturally fairly complex. It isn’t easy trying to make a remote system look like it is a local one. Lots of vendors produced a lot of products, and some companies were founded around some of these technologies.

While each of these technologies is good (to varying degrees) at providing a distributed object computing platform within a local physical environment, they didn’t scale very well over long distances or between enterprises. Most of them required a large number of proprietary ports to be opened in company networks, which has security implications not to mention just the operational issues of making it happen.

On the other hand, HTTP and Web pages nicely sailed through port 80 which, in most cases, was already open. Both vendors and customers said, “Wouldn’t it be great if we could do things like CORBA, but using HTTP?” Enter XML, XML-RPC and SOAP in 1999-2000.

Now, if you were a vendor that had spent millions in R&D in getting distributed objects computing working in CORBA, DCOM and EJB but had come up against limiting factors such as complexity of deployment (all those ports), lack of interoperability between CORBA, DCOM and EJB and the way the Web was influencing the development of applications, what would you do?

I bet you’d figure out how to take all those things you’d been doing and make them work over ubiquitous Web protocols. I’m not saying this is necessarily bad and doesn’t have its place, but there are two other big reasons why you might think it would be reasonable to do this:

  • it is the way major software vendors had been developing systems since as early as 1990, meaning
  • there was a legion of software developers who understood how to develop distributed systems using those concepts and mechanisms

Vendors are protecting their investment because they need to stay in business and keep their shareholders happy, but somehow make their distributed computing technologies work together as more and more people are running heterogeneous environments not only internally, but across trading partners.

The Web is different, however.

In the same way that messaging-oriented middleware (MOM) isn’t the same way of thinking about solving distributed computing problems as using distributed objects, building successful distributed hypermedia applications using REST for either human/computer or computer/computer interaction requires a shift in the way you think about the problem.

If you can’t suspend your assumptions about how things ought to work to understand how they do work in a different environment, e.g. MOM or REST, you’ll forever be frustrated and not understand the advantages and disadvantages of this approach over any other. From my perspective on the recent ROA/SOA thread, this is where we are and why reaching any sort of common understanding is and will continue to be so difficult.

“Curses! Foiled Again!”

Posted in SOA by AST on Thursday, August 17th, 2006

Only kidding–well, kinda only kidding.

I just received this article from FTPOnline about a critter called NetKernel from 1060 Research. This thing looks important, so everyone who’s been napping (me included) had better sit up and start paying attention.

The title of the post relates to the similarities of the NetKernel architecture (at least on the surface, I haven’t done a deep dive yet) to what I was trying to explain in the SOA CoP forum a while back that eventually prompted the post on Information-Oriented Architecture back in April. It is also quite close to the internal research that I’ve been doing about what would be the optimal way to deliver an XML messaging system–if that’s what you were trying to do.

What I had been doing is trying to get back to the fundamentals of what makes the Internet actually work: highly-distributed networks and the applications that provide large-scale, asynchronous messaging on top of them: SMTP. What I didn’t quite do was take the whole-hog REST approach, but I was getting there around April, because I think you can do some interesting things with XML tuple spaces and a system like NetKernel.

As is pointed out in the article, the key is caching and a separation of the logical destination address from the physical location in the network. I’m sure there are some differences between what I’ve been working on for the last couple of years and what 1060 Research has been doing, but the good thing is that now I know that the concepts are out there, under an open-source license since Peter Rodgers (1060’s CEO) was able to liberate his research project from HP once HP changed strategic direction. In and of itself, that is worth my respect. Most organizations wouldn’t be as pragmatic with their investments. Lucky for all of us that they were.

If what the article says about NetKernel is true, it proves some of what I think is wrong with all of the current crop of SOA product implementations based on code generation and direct interaction with SOAP messages. I also think the NetKernel approach (if I understand it correctly from the article and brief overview of the article) blows the pants off things like JBI, SCA and anything that is going to lead you down a particular technology choice for implementing your SOA.

People are so worried about tools and products that they’re losing sight of how elegantly simple the network model, and REST in particular, can be in making some of their implementation headaches go away. What do you think is really going to make some really big iniatives like the DoD and U.S. Government’s Information Sharing (PDF) (HTML via Google) really happen? It won’t be via passing SOAP messages around in an ESB, I can assure you.

I could go on a lot more, but I won’t. I do intend to devote some of my extremely short supply of free time over the next few weeks to looking at NetKernel more closely based on what I’ve been doing. For what’s similar to what I’ve been doing enough to claim prior art (the copyright date on the website starts in 2003–long before I ever knew what the PSB was), I’ll be motivated to share and draw some concrete conclusions as to why I believe this is important.

Ok, I lied…there’s more…

Sometimes, as software people, we get so caught up in the abstractions we’ve built that we forget that sometimes there’s a simpler way to solve the problem. No, it won’t solve all problems, and there’s some use cases for all that other junk. What we need is some people who can objectively assess, from a technology and business perspective, what the consequences of implementing SOA using various technologies are. Maybe we’re just so far lost that having someone give you a fresh towel to wipe the sweat from your brow is good enough that we don’t realize that the other guys are in Hawaii with a Waborita in their hands, and we’re still shovelling coal.

I think there really are better ways to solve some of the real IT problems today, but I’m not convinced of the “wisdom of the crowds” in this case. I think we need a few more people like 1060 Research to go out there, as it seems Chris Gunderson from the Naval Postgraduate School is doing by mis-quoting Doc Emmett Brown from Back to the Future:

I’ve lost my taste for technology roadmaps. “Where we’re going we don’t need maps!”

The quote was in response to a request for the creation of a roadmap of current SOA offerings to an ideal state, stated in evolutionary stages, posted to the SOA CoP mailing list. I might state it a little differently, at the risk of being labeled as a zealot: where we’re going, we don’t nead SOAP!

Revolutionary, not evolutionary, steps are what we need, and that is why I think the potential I see in the NetKernel architecture is so important.

And now, back to your regular scheduled silence…

Are XML Gateways Really the Answer?

Posted in SOA, Security by AST on Thursday, June 1st, 2006

Updated 2006-06-06: I’ve added some editorial comments based on some feedback from a friend of mine who also does security. No sweeping changes, but just a couple of things for clarity. — ast

Based on some discussion about Web services security on the service-orientated-architecture yahoo group, I decided to dig a little deeper into what XML gateways would actually be good for in a large-scale environment. We don’t use them in Reach because we have implemented some of the same functionality as part of our message delivery framework. In theory, we could use something like an XML gateway instead, but I’m not really convinced. For a lot of readers, some of the introductory material may be a bit remedial, however this detail is relevant to the rest of the discussion and I want to make sure everyone is starting from the same place.

XML gateways do have some desirable qualities, however, if you apply them in the the right way. This article attempts to look at some of the considerations for using an XML gateway in an SOA based on a very brief top to bottom view of how they might be deployed in real life. And, no, this article isn’t talking about any particular project, living or dead that I have worked on. Any resemblance to an actual project is purely coincidental.

The Theory

The theory of the XML gateway is that since firewalls allow arbitrary XML documents through without question on port X (probably 80), wouldn’t it be really great if there was something that could sit in the way and do the same sort of thing for XML documents that a firewall does for TCP/IP packets. I’m not going to go into the gory details of XML firewall vs. XML gateway as I’m sure you can use Google at least as well as I can. I’m going to focus on XML gateways for the rest of this discussion.

According to several of the XML gateway vendors, the typical deployment diagram looks something like Figure 1.

Figure 1: Typical XML Gateway Deployment View
Figure 1:  Typical XML Gateway Deployment View

A Dash of Reality

The view in Figure 1 is great for marketing slides and to explain the general concept, but it hides a lot of important aspects of how it would be deployed in a more complete environment. A more typical, non-trivial enterprise deployment of an XML gateway might look a lot like Figure 2.

Figure 2: Augmented XML Gateway Deployment View
Figure 2:  Augmented XML Gateway Deployment View

While the diagram in Figure 2 is still far from complete, it serves to highlight some of the issues to consider when attempting to apply an XML gateway in your environment. I’m going to discuss some of the key aspects in Figure 2.

  1. Secure the Transport Layer

    Any Web service serious about security will require some sort of secure communications channel. Normally, this is accomplished using HTTPS. Depending on the strength of your security model, the requester may also need to present a client certificate to authenticate themselves to the server in addition to the client’s validation of the server’s certificate. While this can authenticate the requester, at this level, it is used to validate the identities of both ends of the communications channel.

    Figure 2 illustrates this secure channel from the requester through the external firewall and into the Content Services Switch (CSS). Many environments have a security policy which prohibits encrypted traffic originating outside the front-end firewalls from being transmitted within the infrastructure. This is normally required so that things like network intrusion detection devices (NIDS) can scan all of the packets transmitted within the rest of the network. Obviously, if the channel is encrypted, it will be impossible for the NIDS to check for anything useful.

    Once the CSS has terminated the SSL/TLS channel, it performs load balancing and forwards the request to a selected front-end web server. The CSS is also responsible for detecting fail-over requirements in the event that one of the web servers is unavailable.

  2. Authenticate the Requester

    In the likely event that the site is hosting both Web services for humans and for machines, a common deployment pattern is to have some sort of plug-in agent on the front-end web servers which intercepts all of the incoming URLs to see if the requester is authorized to access them. If the requester doesn’t have an authentication context, these plug-ins typically redirect to a credential service so that the entity on the other end of the channel can authenticate themselves. If there is a human on the other end of the channel, they may be required to present some sort of personal credentials like a username and password for authentication to a web application or portal even if the channel they’re using has been established using 2-way SSL. If there is a machine on the other end of the channel, it may be required to authenticate using its client certificate, or it may simply use HTTP basic authentication, depending on the service’s security requirements.

    Once the credential service has authenticated the requester, they are redirected back to the original target URL and the authorization check is performed again. If the requester is authorized, they can access the resource; if not, they are redirected to either the credential service or a “no access” page.

  3. Web Server Redirect

    Since the web servers have the request, they may be responsible for determining, based on the URL, if the request is for a human facing or a machine facing service. One option is to route all of the machine facing service requests through the XML gateways (using some sort of load balancing scheme) and route the human facing service requests directly to the portal servers (most likely relying on application-server-specific plug-ins “stacked” in the web server to provide fail-over at the application server layer.

  4. XML Gateway Processing

    It is assumed that the XML gateway will perform the following tasks on the requests it gets at the very least:

    • Virus checking of all Base64 encoded string literals within the message
    • Checking the document for well-formedness
    • Perform some sort of load balancing between the Web service servers

    The rest of the capabilities of the XML gateway may or may not fit into the overall architecture of the solution. Reasons this may be true will be discussed in subsequent sections.

  5. Establish Appserver Security Context

    Regardless if the requester is accessing the portal or a machine is accessing a Web service, the container providing the service needs to establish some sort of security context for them. This may be desirable for simple things such as portal personalization, but, more critically for security, it may be necessary for implementing role-based access controls (RBAC) to control the services available to the requester and to limit what data they may access. This discussion is going to avoid considering mandatory access control (MAC) models used in high security environments because the majority of non-governmental or military systems will be using a RBAC security model.

Things that Make You go Hmmmm…

There are a few things about the above scenario that seem pretty logical at first glance, but, if you think about them for a moment, you’ll notice that some important things are missing. Namely:

  1. How does the client certificate get used to authenticate automated clients?
  2. What kind of identity management system is being used to perform the authentication?
  3. What kind of access control/policy mechanisms are in place and where are they located?
  4. Are all of the Web service requests using SOAP?
  5. What is the basis of establishing the security context for the application servers?
  6. Should all traffic be going through the XML gateway?
  7. Where does the CSS get the client certificate information?

What Are We Trying to Do, Really?

Like most things, in order to figure out the best application of security policies and procedures, and in particular, the XML gateway device which is what prompted this article, you first need to understand your environment and what you’re trying to do. This means identification of the security requirements. For this discussion, I think the following are fairly realistic security requirements:

  • No encrypted traffic is allowed in the internal network if it originates from “the unwashed horde”
  • All traffic on the internal network must be subject to in-flight intrusion detection and traffic analysis
  • Sensitive customer data (like credit card numbers) must not be stored unencrypted (even in audits or application logs)
  • Any changes to a message must be securely audited
  • Each message received must be securely audited
  • Each message sent must be securely audited
  • Any malformed messages received must be reported
  • Any meaningful service access requires authentication and appropriate authorization
  • Automated trading partners must establish a secure communications channel using bi-directional, certificate-based authentication
  • Messages may be relayed by external intermediaries, therefore cannot assume the “last hop” is the message originator
  • Identity management must be centralized
  • Security policy management must be centralized
  • User communities between automated and human users must be segregated
  • Internal server credentials must not be shared between service types
  • User profile information must be stored in LDAP (including any security certificates and keys)
  • No direct access to internal LDAP or database servers will be allowed to originate outside the secure zone

Just to make the problem a little more interesting (and life-like), the overall environment must be reasonably resilient to change in the services which are available. The business has a policy (somewhat like Amazon), that potential value-added services will be piloted, but there is no guarantee that they may eventually enter full production mode. This means that both the interfaces and the services available to both the human and automated users are subject to potentially frequent changes. However, there will be some business-critical services available to both human and automated users that are essential to the functioning of the enterprise (examples might be supply-chain management services, customer order tracking and other “boring” stuff like that).

Decisions, Decisions, Decisions

Regardless of your architectural philosophies for Web service design, there are a few things that whomever is putting together the final solution needs to take into consideration based on everything we’ve been discussing up to this point. I know that my own personal bias is different to what many people think is the “right way” to build Web services infrastructures, so I’ll try and keep my opinions out of this as much as possible and focus on the decision points.

First, a re-cap on the general capabilities offered by XML gateways. Essentially, the functionality is divided between content inspection and access control, but most XML gateway vendors also offer some sort of management functionality as well.

Some of the things that fall under the content inspection category are (courtesy of Vordel):

  • HTTP header inspection
  • XML denial of service detection
  • XML external entity attack prevention
  • SQL injection prevention
  • Buffer overflow prevention
  • Service scanning prevention
  • Message size analysis
  • SOAP attachment analysis
  • XML well-formedness validation
  • XML schema validation
  • XPath processing
  • Auto-generation of XML Schema (XSD) from WSDL
  • Auto-generation of XML Schema (XSD) from sample XML messages

To me, the ones I’ve marked in bold are core functions related to XML handling while the others are either “helpful” (auto-generation of XSD) or they overlap with areas covered by other products. There’s a couple of these items worth discussing briefly, however.

SQL Injection Prevention

SQL injection is certainly a valid concern–if you’re doing direct SQL access with badly designed or non-existent data access layers. Let’s face it, if you’re using any kind of persistence framework (including things like Jaxor, Hibernate, TopLink, EJB and others) or you’re using prepared statements, you’re really going to not suffer too much from this type of problem. Normally, this is an issue when you have database code dynamically generating SQL using string concatenation like:

  String query = "select * from table1 where col1 = \"" + inputParam + "\"";
  ...
  rs = connection.execute(query);

But, of course, you wouldn’t do stuff like that, and a lot of SQL client libraries barf if you try and combine multiple statements using the semicolon anyway. If you want more, highly-accessible information about writing secure code, check out my friend Sverre’s book, Innocent Code.

That all being said, if you’re fronting services that you don’t know the security constraints of (and you’re insisting on exposing them to the world anyway), this type of thing may be useful. I’m sure things are better than the were in 2003-2004, but some of the reviews discovered that the pattern matching for injection techniques were pretty straightforward. Most probably wouldn’t take into account escaped Unicode characters or other potentially nasty ways around the obvious select, update, and delete SQL keywords. Also, following the rule of least privilege, you wouldn’t have your front-end code running with any more permissions than were necessary to actually perform the given task now would you?

Buffer Overflow Prevention

Buffer overflow is another often misunderstood security issue. Again, I’m not saying that it isn’t there and it isn’t a problem, but it helps to understand what a buffer overflow does. Briefly, it relies on the attacker knowing something about the platform and software running on the target environment. If either of those things are different than expected, the attack won’t work. Why? Well, the attacker encodes detailed assembly/machine code which is dependent on things like the processor architecture, stack size and the instructions being executed by the software with the vulnerability. If successful, the attacker corrupts the stack frame and causes the malicious code to execute.

Interpreted languages like Java, C#, Ruby, Python, Perl are not normally susceptible to this type of attack because it is very difficult to get down to the hardware (unless there’s a bug in the runtime interpreter). If you are writing your web services in C, C++ or any other language with direct memory allocation, or you’re passing parameters directly from an interpreted language to a library written in one of these languages, you may have a problem.

With this in mind, how does an XML gateway actually prevent you from buffer overflow attacks? Not being someone who implemented one of these things, I’m only guessing here, but one way is to do it based on signatures. This puts it in the same camp as virus checking and NIDS, which, as has been pointed out before (remember the Viking story?), there’s an inherent latency between when a new attack is identified and when the signatures are deployed on any existing devices. Maybe it’s more sophisticated than that, but I’m not really sure how else you’d do it, because there’s no way the gateway can know anything about your implementation language and software. This means something very important to anyone developing their own software: if you have buffer overflow problems in code you write, there’s no way for the gateway to know about it.

Of course, if an attacker tries a known sequence of bytes to get a shell prompt, that might be detectable and stopped, but maybe not. Ask your vendors if you’re serious about trying to address this problem, but you should really pay more attention to how your developers are writing code. Another book which goes into this particular problem a great deal is Viega and McGraw’s Building Secure Software.

Service Scanning Prevention

Depending on your background, there’s a few different interpretations to this one. Historically, a “service” was any program listening on a port, so if you wanted to see what attack possibilities you had, you could use a port scanner to detect open ports. As depicted in Figure 2, between the front-line firewalls and the CSSs, a port scan isn’t going to show you very much of interest. What must be interpreted from the context of an XML gateway is that it somehow protects you from discovery of the Web services you are offering, with the general idea being to limit your exposure window.

The effectiveness of this approach depends very much on the way you design your services. If you are using REST services (or WOA, to use the shiny new acronym), each service is just a URL, so, if you knew what you were expecting to see as an output, spidering a site may be a valid way to do “service discovery”. If you are deploying SOAP-based services in the MEST style, you are pretty-much in the same boat. Each URL will only support one operation, so your primary exposure is discovery of the URL. Finally, if you’re using “standard” WSDL-based SOAP services with multiple operations, the XML gateway may be able to do something for you. In any event, an external service requester doesn’t know anything about mappings which may be introduced at any of steps (2), (3) or (4) in Figure 2, so they only know that you’re exposing a service with a discoverable operation instead of discovering any interesting information about your network.

Stepping back a bit from the details, there’s another issue worth considering. Part of the whole 3-part puzzle of the original Web services architecture was: requester, provider and discovery. If you’re already publishing your WSDL on a UDDI or ebXML Registry/Repository somewhere that supports anonymous access, trying to use an XML gateway to “hide” your service endpoint information isn’t going to make a lot of sense. Also, if you require authenticated access (especially 2-way, certificate-based authentication), Joe-random-hacker isn’t going to be able to access your endpoint URLs anyway because they won’t get that far into your network.

XML Denial of Service Prevention

In this category are things like external entity resolution based DoS, recursive XML elements and large documents. Again, the susceptibility of your own particular services to these types of attacks is very much dependent on your service design. For example, if you are using a profiled version of XML, much like rig0002, things like external entities aren’t allowed, so anything based on that attack wouldn’t be allowed into the system anyway.

The other two are primarily based on the assumption that you will be using DOM or tree-based processing of your XML. Unfortunately, most of the automagic data-mapping libraries are based on DOM (XMLBeans I know does this for sure, and using something like JDOM is going to have the same type of issue), meaning that if you are relying on these tools, you may be vulnerable to these attacks.

On the other hand, if you are using a stream-based XML processing mechanism like SAX or StAX, you can process XML documents as a stream of events, meaning that the overall size doesn’t matter to the consuming application. However, it may make a huge difference to things between the requester and the ultimate provider. Any intermediary device which is going to try and buffer requests or do store-and-forward messaging may have upper size limits on the messages they can handle. If it is a hardware device, it’s likely to be significantly less than what your JMS, MSMQ or MQSeries provider software can handle, so it is important to know these kinds of details about your end-to-end environment.

The XML gateway can attempt to detect these sorts of things like deeply nested messages and messages exceeding a certain size, but, like any security threat, you need to understand your real vulnerability to it and determine if it is something that’s going to be a cost-effective issue to control.

XML Schema Validation

The topic of XML schema validation also is an important consideration for your overall architecture. Schema validation isn’t generally that expensive in terms of possible operations to be performed over your documents, but, naturally enough, it requires the validator to have access to the XML schema. There are two main approaches to this problem.

The first approach is the most flexible, but also the poorest performer: let the validator load the remote schemas. This will work if your schemas are local and located in your secure zone (and you have the appropriate traffic rules in place), but, in most cases, your validation server will probably not be allowed to make arbitrary connections to the outside world. I’ve seen a few systems fail after deployments into production environments where these sorts of rules were in place–oops.

The main issue with this approach is less about access and more about performance. In the worst-case scenario, you’ll need to download each schema from a remote location every time you need it. The various XML validation tools have a variety of caching mechanisms to minimize this issue, but you’re still depending on the availability of a resource you don’t control, and if it’s a cached external resource and it is updated (versioning is a whole different problem), you may not know about it anyway.

The second approach is to have your XML validator either store the schemas locally, or access them from resources within your own environment. This sounds straightforward enough, but it needs to be considered in the context of the functional requirements listed above. If you have a highly dynamic set of XML schemas, you’ll need to provide a centralized access point for them within your environment, otherwise you’re going to have a potentially complex provisioning problem in getting the schemas to all of the locations that need them. Of course, by centralizing them, you’re potentially introducing a central point of failure, so you’ll need to address this issue as part of your overall architectural design.

If you want to catch non-validating documents as early as possible in the request chain, then the XML gateway may help you here. However, depending on how loosely coupled your architecture is from a processing point of view, the document may need to be validated downstream anyway. Additionally, if you are using Schematron or NVDL to do more complex, business-level validations of your XML messages, I don’t see how offloading validation to the XML gateway really is a net gain.

I’m not going to go into much about the rest of the content inspection topics. Even though it’s sponsored by Forum Systems, one of their whitepapers, Attacking and Defending Web Services (PDF) provides a pretty good description of the types of XML-specific attacks the XML gateway is trying to defend against.

Access Control Operations

The access control aspects of XML gateway functionality are arguably the best reason for considering adding one to the infrastructure. Why? Because these are generally the most process-intensive operations that can easily benefit from hardware acceleration. Of course, you could add dedicated crypto PCI cards to your servers and integrate these functions into your service implementations–the question is do you really want to?

Again, borrowing the list from Vordel, the access control functionality often provided is:

  • XML Encryption
  • XML Signature
  • WS-Security Username Profile authentication
  • WS-Security X.509 Certificate Profile authentication
  • WS-Trust
  • SAML
  • LDAP integration
  • Active Directory integration
  • XKMS
  • SSL (server and bi-directional)
  • HTTP authentication
  • IP address filtering

Besides having just won Buzzword Bingo, there are several of these functions which are important to your security architecture, in addition to influencing design decisions for the overall solution architecture. Again, I’ve placed the XML-centric functions in bold. I’ll group these together based on authentication and authorization functionality, signature and encryption functionality and PKI functionality relating to key management.

PKI Functionality

I’m going to start here because a number of the other aspects of the rest of the functionality rely or can rely on some parts of PKI. One of the biggest challenges with PKI is providing a comprehensive view of trust (see the U.S. Government’s Federal Bridge Certification Authority for some of the challenges in trying to do this sensibly), and the other relates to management of the digital certificates themselves. These two issues arguably are what have impeded the widespread adoption of PKI infrastructures even though the theory is 100% sound. Therefore, any participant in tasks relying on PKI technologies or artifacts is required to deal with these issues to some degree.

If you are considering going the route of the “private CA”, there are a number of things to keep in mind. Yes, you have total control over your certificates, but maintaining a “real” CA takes some serious investment in hardware and security measures. Most organizations simply can’t justify the costs. A simpler alternative that only works for providers with centralized control (an 800-pound gorilla) and small scale is to dedicate a machine that isn’t connected to any network to issue the necessary certificates. This solution also has some problems when it comes to managing revocation lists (CRLs), and it can’t support any type of dynamic certificate validation protocol to ensure revoked certificates cannot be used until the appropriate CRLs are updated.

I was reminded that you may have other alternatives to a “full-on public CA” as described above. If you are only handing out certificates for use by you and your trading partners for authenticating with you, you don’t have to go to the same extreme measures normally taken when providing a CA for use by the public at large. — ast

Any device using certificates needs to have access to the certificates as well as be able to determine the validity of those certificates. This is where choices on your overall architecture may make provisioning of keys more difficult than necessary. An alternative is to have the XML gateway directly access the LDAP server and store them there. If we do that in this example, we are violating one of our security requirements about direct access to directory servers from hosts in the DMZ. An alternative might be to load the certificates on the device itself, but that can be problematic as well. For example, the Cisco 6500 series CSS can only hold 256 certificates, so if you have a number of trading partners (or regularly expire certificates to detect unauthorized users), this could pose a problem. Unfortunately, in the case of the Cisco CSS, there’s no other way to point it to an external certificate repository.

Another couple of things here. First, you’d probably use the Cisco 11500 series CSS, however you can buy an option for the 6500 Catalyst which also provides CSS capabilities. The information regarding the number of 256 certificates and 256 key-pairs apply per SSL module installed (the 11500 has models that can take up to 4).

Secondly, I had in my head that you would not be installing your server’s CA public certificate and would instead be installing individual client certificates for your environment to give you fine-grained control as an alternative to more exotic certificate revocation strategies. Obviously, if you install the CA certificate instead, you can sign as many client certificates as you want and you will only use 1 of the 256 available certificate slots–giving you easier maintenance and less deployment overhead than might have been required in the original scenario. — ast

These types of considerations must also be part of the overall identity management strategy for the architecture. If you are going to be doing multiple things with the certificates (e.g. encryption, signatures and authentication), they may need to be distributed to multiple locations in your environment. This may be one reason that an XML gateway would be attractive because it could do all of these things in one device, but it still may not be a good fit with your overall identity management plans.

XML Signature & Encryption

There is no question that XML signature and encryption are expensive operations. There have been hardware solutions to provide dedicated processing of this sort of activity for many years. However, part of the question relevant to XML gateway usage is related to performance and part of the question is relating to what you are trying to do.

One of the first things you need to do is figure out what you are really doing by signing an XML message. Is it to detect unwanted changes to the message? Is it to provide non-repudiation with trading partners? If so, do they trust you, or do you need to have this done by an independent 3rd-party?

If you are attempting to detect unwanted changes in the message, it is important to identify the scope in which those unwanted changes can occur. It is possible in an SOA environment with extreme loose coupling that you may want to have each processing step sign the message, both for tamper detection as well as verification of what someone actually saw at a given point in a business process workflow. In an environment mostly concerned with tampering between the sender and the receiver organizations, it may be sufficient to sign the message at the XML gateway on the way out of the network. However, if you are only sending the message over a synchronous connection which is secured via 2-way SSL, signing the message may not be necessary. If you are placing the message on a publicly-accessible FTP server for collection asynchronously, it may make more sense.

Obviously, the last scenario assumes you trust both ends of the message transfer channel to not modify the message. It may be desirable to sign the message anyway in this scenario, just to ensure that there is no question as to what was sent is actually what came out of the SSL channel. — ast

Encryption of the message contents can either be done for the entire message or for individual elements within the message. As Doug Kaye discusses, different elements of the message may be intended for different recipients, meaning they should be encrypted using different public keys. When considering using encryption for data, you also need to consider the temporal aspects of the message, e.g. how long will you want to access the encrypted data, and if you are going to encrypt it using more than one key.

Anyone who has ever seriously used a personal encryption program such as PGP or GnuPGP for their email has probably made the initial mistake of not encrypting the email they send with their own key and then discovered they couldn’t read messages in their Sent mailbox. The same issue applies when using encryption in a business context, though the consequences of lost data may be a lot more than the annoyance over an unreadable personal correspondence.

Naturally, the more keys you use, the more CPU intensive the encryption operation will be and the longer the process is likely to take. Also, in relation to the temporal loose coupling of your messages, if you intend to regularly expire your “normal” encryption keys, you will eventually make historically encrypted information completely inaccessible–unless you also encrypt the data using some sort of “master” key which does not have an expiration time-out. Use of such a “master” key would want to be pretty tightly controlled, so you need to figure this in to your overall security architecture as well. If you aren’t going to have accessible data after time T, then this also affects your backup and data retention strategies because there’s no point keeping data around that you can’t read. However, even if you can’t read the data, you may want to keep a record of having had that data for statistics, accounting or other purposes.

The main point here is that you need to really decide from a business point of view why you’re encrypting the data. Is it so you can comply with the credit card processing guidelines? Is it so that you are minimizing your risk of exposing your customers to identity theft should your systems suffer a security breach?

If either of the above or something similar is the motivation, and for this example, protecting credit card data is one of our original business requirements, you may not have any choice but to encrypt part of the information when you are inserting it into the message. The bottom line is: if the information is worth encrypting, it doesn’t make sense to risk having copies of it lying around in log files, console messages and audit tables in clear text.

If you don’t care enough to encrypt it internally, maybe it isn’t worth encrypting it externally–again, especially if you’re transmitting it over a secure channel to a service that is going to immediately decrypt it and shove it in a spreadsheet or database under minimal security controls on the other side. It is absolutely critical that you understand the processes that are manipulating the data you exchange with your customers and your business partners.

In the above scenarios, while you could benefit from the performance gains of a dedicated XML gateway appliance, it doesn’t really seem wise to leave encryption and decryption separated from the point at which the data will be used. In the worst case, if you have a machine that is compromised in the DMZ and the XML gateway decrypts the message payload before sending it into the secure zone, an attacker able to put a network interface on the compromised machine into promiscuous mode would be able to capture the clear-text information at will. In most cases, it will be much easier to compromise a machine in the DMZ than it will be to break the encryption on the messages.

Finally, with both signing and encryption, the process performing the task needs secure access to the key store. Without this, a bogus key could potentially be supplied by a compromised system that would enable an external party to later decrypt the messages. This process also is directly related to the issues discussed in the PKI Functionality section, because, depending on the choices you make, you still have the key provisioning problem.

Authentication and Authorization Functionality

Authentication and authorization (AA) share an overlapping set of problems with the PKI functionality in terms of provisioning of the identity and policy information to be used. In this example, one of the business requirements was that centralized policy management must be part of the architecture. This means any device or process performing authentication and authorization must leverage this consistent set of policies. Once more, we have the distributed vs. centralized (and potential single point of failure) issue.

For the sake of argument, the directed arrows marked as (6) and (7) in Figure 2 are some sort of remote call to a policy store which is persisting information in the LDAP servers. This policy store provides complex rules that can be tested at any appropriate policy enforcement point (PEP). Arrow (6) represents the PEP required by the AA functionality of the web server plug-in. Arrow (7) represents a possible PEP required by the XML gateways if “standard” SOAP services with multiple operations are deployed at the same URL.

From an authentication point of view, the only context available to the XML gateways comes from the message itself. It may be possible for the XML gateway to leverage the “split” X.509 certificate information conveyed as HTTP request headers to provide additional context in making the authentication decision, but none of the products I saw discussed such a feature. In the case of multi-operation SOAP services, the XML gateway is probably the best PEP to make the authorization decision to determine if a given requester can invoke a particular service.

However, unless the identity and authentication credentials are present in the message, it is possible that the authentication information present in the request may not be available to the XML gateway. There are a number of ways around this problem which do not involve putting the XML gateway “in front of” the web servers or the CSSs, but once more, you need to be wary of the provisioning complexity potential of the business requirement for a fairly dynamic service environment. It may be possible to minimize this effect so you can use a consistent URL pattern matching expression at either the switches or the web servers, but this decision must be part of the overall architectural design of the system.

Closing Thoughts

Things are rarely as simple as they seem, and this is especially true regarding security. Every decision made at an architectural level will have security consequences just as every security decision will have architectural consequences. This is an unavoidable characteristic of system design. The bigger the system, the more decisions that need to be made, and the harder it is to achieve consistency across them. This reason more than any other should make the necessity of having a well defined, enterprise security policy in place before you have designed, built and implemented your system architecture quite clear. One of the best quotes I’ve seen about designing systems is from Barry Boehm:

“The most important software engineering skills we must learn are the skill involved in dealing with a plurality of goals which may be at odds with each other, and the skill of coordinating the application of a plurality of means, each of which provides a varying degree of help or hindrance in achieving a given goal, depending on the situation.”

Part of what prompted me to write this was there is a widespread school of thought that goes something like, “all I need is product or device X, and my security problems will be solved.” Any security professional will tell you this is patently false. Security is achieved through a combination of people, process and technology being applied in a consistent manner based on a security policy created to support the business goals of the enterprise. There’s no way that one product, or even a set of products can completely solve the security problem.

A lot of people (and I’m not talking about Anne here) think things like XML gateways are “pixie dust” for security in the same way that people thought XML was some sort of magic solution to all business data representation and exchange problems. The answer to the question posed in the title of this article, as illustrated by Bohem’s quote, is: it depends. Hopefully, if you’ve made it this far, you learned a few things, and you’ll be better prepared to participate in the overall security effort and answer the XML gateway question for yourself–regardless of your role within your organization.

References