Monday, September 01, 2003

Web Services: "How We Gonna Get These Here Machines to Talk to Each Other"

Web services are certainly the buzz de jour and have had quite a bit of sustaining power over the past couple of years. Many seem to think that this it is all very exciting, whilst newcomers seem very intimidated by it all. This is understandable, given:

  • Web services are about the biggest alphabet soup of standards to date
  • Open up most any XML file – enough said

    While in fact the point of web services is that they are very simple. In order to recognize this, it is important to take a historical perspective of the computer industry, namely “how we gonna get these here machines to talk to each other?”

    Phase 1 – Dig a Trench

    In the 1970s, the early days of computers, getting two computers to talk to each other was pretty straightforward. You laid a physical wire between the two machines, developed a proprietary protocol, messed around on either end for a while, and presto, the two machines could talk to each other! Needless to say, this was very expensive and cumbersome, and machines generally were only hooked up to each other if it was a very pressing need and they were in close geographic proximity to one another.

    Phase 2 – Private Packet Switching

    Phase 1 lasted for a while, but soon enough people needed a machine to be able to talk to more than a handful of other machines, and the packet network was invented. Third party companies hosted packet switching networks based on protocols like X.25, and if two companies participated in the same network, they could send data to each other. The format of the data was up to those two companies to decide, so there was still a bit of negotiation involved. However, this architecture enabled closed supply chain systems like EDI to evolve.

    Phase 3 – Public Packet Switching

    In the background, while corporations were using private X.25 networks, the Internet evolved amongst government and educational institutions. This was eventually opened up for commercial use and everyone switched to it. At this point, anyone could connect with anyone with a variety of open protocols to choose from, yet there was still no standard for data format.

    While this may seem like a panacea, in fact it was still very complicated for divisions within a company, let alone companies, to interact with each other. Consider your average e-commerce site a couple of years ago. In order to to accept credit cards over the web, they needed to sign a deal with a company like CyberCash. This entailed choosing which client API to use (Solaris C++, Windows C++, Java 1.1, Java 1.2, etc.) and sending engineers to a training class. Want to show your physical store locations on a map? Same type of API deal with MapQuest. Someone in marketing wants to start a frequent flyer miles promotion? Netcentives provided this service, but required that you leased a computer, added it to your cluster, and send it data via a proprietary protocol. The leased Netcentives computer would then forward the data on to Netcentives.

    Today, all of these contortions sound ridiculous. Yet we forget that this is what the industry was doing only two years ago.

    Phase 4 – Web Services

    Enter the world of Web services, initially popularized by IBM and Microsoft both agreeing to support SOAP. This was enough to kick-start the industry, since it would bind the Java and .NET worlds.

    Web services are inherently pretty simple – self-describing text files are sent around. In the most basic implementations, there are no binary formats, the elements are named and can therefore be in any order, and the data is hierarchical and therefore relatively well organized. When you compare this to the data that was exchanged in Phases 1-3 above, this is really simple. Essentially the McDonald’s of data exchange, just like everyone can buy a Big Mac, everyone can generate or parse a text file and communicate over HTTP.

    Definitely web services are not what you would design if you were to create an efficient, technically elegant protocol for machines to communicate with each other (CORBA, DCOM, etc.). But in the real world, especially the world of high power processors sitting around idle most of the time, agreement amongst endpoints is much more important than technical efficiency. So if all everyone can agree on is self-describing text files over HTTP, so be it.

    While it seems like web services are a big jumble of standards, most of those emerging standards are domain specific, like how to describe identity provisioning or a manufacturing part. Most of the web service standards are pretty mature:

  • SOAP as the protocol for a web service request and response and all of the SOAP extensions to add things like transactions, reliability, routing, and security.
  • WSDL to describe the contents of a web service request and response.
  • UDDI to describe the web service calls available (this one has been teetering a bit and may be replaced with something similar)
  • *ML - all of the emerging standards to describe pretty much everything out there in a text file.

    Summary of the Phases

    Phase
    Network
    Protocol
    Data Format
    Cost to Connect
    1 - Dig a Trench Proprietary Proprietary Proprietary $$$$
    2 - Private Packet Switching (X.25) Proprietary Standard Proprietary $$$
    3 - Public Packet Switching (TCP/IP) Standard Standard Proprietary $$
    4 - Web Services Standard Standard Standard $
  • Application Servers 2004: A Big Muffin in a Donut World

    After my time at JRad, NetDynamics and Sun, I have been thinking about how application servers originated, what they were meant to do, and where they are going.

    Application Servers, 1995-1997

    With the advent of the “Internet” age in 1995, all of a sudden corporate resources had to be available to HTML/HTTP clients. The HTML/HTTP clients were very different from previous clients in that they represented numerous intermittent connections, while in the previous client/server world there were generally a fixed number of clients with constant connections.

    For example, an internal HR system that was built to support 30 HR reps and perhaps scale to 50 HR reps over the next couple of years was going to crawl if it all of a sudden had to handle 100,000 employees doing employee self-service during 401K election time, with each connection building and then tearing down a connection to the database.

    Application servers were introduced to solve this problem. App servers would maintain a few open connections to the backend and queue the HTML/HTTP client requests into those connections. The back-end resources all used a variety of wacky protocols to access them, including things like dbClient, SQL*NET, CICS, SAP BAPI, PeopleSoft MessageAgent.

    When enterprises took a deterministic, architectural perspective when implementing an application server, a requirement was to handle all of the wacky client protocols that were out there, like IIOP and DCOM, in addition to the variety of back-end protocols, so that the server could grow with you. The application server was nirvana for the CIO since it looked like it would solve the big corporate systems impedance mismatch as well as the pressing need to get these backend systems onto the web.

    A bunch of these types of deployments were done in Java using servers like NetDynamics (acquired by Sun) and KIVA (acquired by Netscape). At this point, WebLogic only offered multi-tier JDBC drivers, had no clustering capabilities, and rudimentary application server features with their BeanT3 product.

    Application Servers, 1997-2003

    After Java on the client started to stall, Sun noticed that customers were deploying Java on the server and began to create new API’s to standardize this space. Initially, these API’s were simple ones like JDBC and Servlets, but then evolved into JSP, EJB, JTA, JMS, and JCA and eventually fell under the umbrella brand J2EE. In the meantime, all clients became HTML/HTTP clients and no one was trying to build fat clients that used RMI, IIOP or DCOM.

    So we entered the age of the standardized yet still very profitable application server, where everyone wanted to buy a server that would do everything when in actuality 80-90% of deployments were simply Servlet/JSP to JDBC applications.

    In 1999, Sun acquired Netscape’s server line, including the original KIVA server. The KIVA server was primarily a C++ server and did not support the new Java standards well, but Sun chose it and killed the NetDynamics server it had bought a year earlier. BEA had acquired WebLogic in 1998, and in early 2000 WebLogic finally had enterprise features like clustering support and went on to dominate the J2EE market.

    Application Servers, 2004: The Muffin Architecture

    Yet there was a problem with the J2EE architecture: it is very hard to learn, use, and deploy. And whilst originally people though they would have “a” middle tier, they ended up with numerous middle tiers, running a variety of different versions of Java and .NET technologies, and these didn’t interoperate very well.

    And so the industry started to shift towards interoperability, and as well all know from past experience, interoperability means MAKE IT DUMB. So after coalescing on self-describing text files (this is about as simple as it can get), the brave new world of web services was born.

    Web services are creating an environment where all of the clients are text clients, either SOAP/HTTP or HTML/HTTP, and all of the back-ends serve text via SOAP/HTTP. This includes databases, ERP systems, CICS, MQ, basically everything.

    The application server has become a big text pump, and the business logic has moved off to edge nodes. The databases, ERP systems, etc. themselves process the business logic and serve out self-describing text. The application server aggregates this text into something useful for the client.

    Application Servers, 2004: The Donut Architecture

    Since the technology industry, like any industry, has the tendency to maintain the status quo, it is sometimes useful to step back and ask what type of solution would be created if the problems of today were suddenly foisted upon us.

    So today’s problem is:

  • Everything serves or consumes self-describing text
  • Everything can talk to everything using SOAP/HTTP
  • Business logic is left up to the server or consumer of the self-describing text
  • Most applications simply transform/aggregate self-describing text from one or more sources

    So what is the solution? We must remember that the original point of the application server was to solve the big corporate impedance mismatch and arbitrate connections to overwhelmed back-end resources. Today:

  • There is no impedance mismatch, everything talks SOAP/HTTP
  • Back-end resources can now all support numerous transient connections

    So perhaps the next generation Application Server is to have NO Application Server in the middle of everything. This is the “donut”, peer-to-peer architecture where there is nothing in the middle, versus the “muffin” architecture, where the Application Server sits in the middle of everything. Anything can talk to anything, and each node has its own mini application server built in so that it can talk to the rest of the world. The interoperability standard is web services, and each of these mini application servers can be written in anything - J2EE, .NET, LAMP, anything that can speak SOAP/HTTP.

    Of course for the donut architecture to become fully realized, web services need to become transactional and offer guaranteed delivery. And yes, you can nest these servers, ie, have a portal server that calls a bunch of back-end web services. While theoretically you could call this a middle tier, it is not “the” middle tier, just another big text pump amongst many other text pumps.

    So where is the application server of the future? It is a big text pump that is embedded in the various endpoints of an enterprise. There is nothing in the middle.

  • The Next Language

    After almost 9 years of programming in Java, I have been thinking about where Java is going and how it fits into the continuum of programming languages in the enterprise.

    Evolution of Corporate Systems and Languages

    Java falls into the category of the corporate language – a computer language and system used by corporations to run their business. To understand how corporate languages evolve, it is important to match them to evolution of corporate computing platforms. Corporate platforms evolved from mainframes to minicomputers to client/server to Internet and now to Grid (parallel Linux white boxes). In each transition, a dominant player emerged, as shown in the following chart:

    As corporate computing platforms shifted, the corporate language of choice had a tendency to shift as well. The hey-day was during the client/server era, when there were numerous popular languages, including Visual BASIC, Delphi (a Pascal derivative), PowerSoft’s PowerScript, and others. These languages were all essentially somewhat-typed, pseudo-interpreted languages. And they were all replaced with Java, a strongly-typed, pseudo-interpreted language and .Net, a somewhat-typed, pseudo-interpreted language.

    During the Internet age, corporations ran a variety of server OS’s in the middle tier, including Solaris, AIX, HP-UX, Irix, and Windows NT. In many corporations, it was a strong requirement that applications be portable across two or more of those platforms in order to prevent vendor lock-in. Once a company ran applications that would only run on one of those platforms, the vendor would gouge that company in the next upgrade cycle since the company had lost much of its leverage.

    Although Java was originally designed to run on first set-top clients and then PC clients, the language and its runtime were portable and clearly met this challenge. Some companies had already started running Java on the server using servers from NetDynamics and KIVA. In addition, Java offered some of the benefits of the languages that corporate developers enjoyed in their client/server days, such as garbage collection and higher-level API’s to operating system features that abstracted complexity.

    Java soon also offered a critical mass of vendors supporting the platform – everything under the sun (pun intended) soon had a Java API, including Oracle, SAP, Tibco, CICS, MQSeries, etc. Over a couple of years, these were all accessible via standardized API’s that grew up to become J2EE, and J2EE grew up to dominate the corporate computing environment.

    What Java didn’t provide was 4GL type tools, but then again nobody had 4GL type tools for web applications, so it was no big deal. It was expected that those would come. However, many years have past, and the vast majority of J2EE applications are still built by hand. A lesson that Microsoft has learned well is that for API’s to be toolable, they need to be developed concurrently with the tool and both the API and tool should depend on easily externalizable metadata. Java API’s were always written on the merits of the API’s themselves, and subsequent tools were predominantly code generators shunned by programmers.

    The Java API’s grew into a morass of inconsistent and incomprehensible API’s, even the most simple things proved to be very complicated. The vast majority of J2EE deployments (over 80% according to Gartner) are simply Servlet/JSP to JDBC applications. Basically HTML front-ends to relational databases. It is ironic that much of what makes Java complicated today is all of its numerous band-aid extensions, such as generics and JSP templates, which were added to make these types of simple applications easier to develop.

    Regardless of these issues, Java and J2EE are completely dominant in the corporate computing realm.

    The Grid: Linux and Web Services

    The industry is currently in a subtle paradigm shift away from larger SMP boxes running proprietary Unixes to large grids of 1-2 processor machines running Linux. These machines already dominate the front tier web server market, and are starting to appear on the back-end with new products like Oracle RAC, the grid-enabled version of Oracle. This transition will also start to affect the middle tier, but is held back by the fact that the popular J2EE implementations were built to run on small clusters of multi-processor machines rather than large clusters of uni-processor machines.

    There is no longer a requirement for portability, since customers no longer feel locked in by a vendor when they are running Linux on a x86 white box. Customers are therefore are comfortable running applications that only run on Linux/x86. In many corporations there is still and there will remain a requirement to be able to develop corporate applications on Windows machines, so the only portability requirement is the ability to develop on Windows and deploy on Linux.

    Today’s corporate applications basically all produce text, whether HTML for a web browser or XML for another application. With the coming onslaught of web services, pretty soon all of the back-end resources will be providing XML rather than binary data. The average corporate application will be a big text pump, taking in XML from a back-end resource, transforming it somewhat, and producing either HTML or XML.

    So let’s look at the requirements for today’s corporate applications:

  • Handle XML (dynamic data with fluctuating types) well
  • Quickly process text into objects and out of objects
  • Most apps have limited logic consisting mainly of control flow
  • No need for portability beyond Linux/x86 and Windows/x86
  • Very thin veneer over the operating system for system services
  • Tuned for 1-2 processor x86 machines

    Given these requirements, Java does not fare very well:

  • XML data is inherently unstructured and it has to be shoehorned into and out of Java, which is a strongly typed language that does not like new types of objects popping into its applications.
  • Java is horrific at processing text since it can’t manipulate strings directly.
  • While Java is great for complicated applications, it is not ideally suited for specifying control flow.
  • Java is a magically portable platform, but there is no longer a requirement for portability other than Linux and Windows.
  • Since there is no longer a portability requirement, developers want only a very thin veneer over operating system services like sockets, while Java provides a huge virtual machine in between the application and the operating system.
  • Most J2EE implementations are tuned for 4-16 processor SMP boxes
  • So if Java does not meet these requirements, what does? Apparently what is needed is a language/environment that is loosely typed in order to encapsulate XML well and that can efficiently process text. It should be very well suited for specifying control flow. And it should be a thin veneer over the operating system.

    Most Linux distribution in fact bundle three such languages, PHP, Python, and Perl. PHP is by far the most popular, Python is considered the most elegant (if not odd), and Perl the tried-and-true workhorse. All three languages are open source and free. As the following graphs show, PHP use has skyrocketed over the past few years:


    Source: Netcraft

    PHP is now three times as popular as Java JSP in terms of URL counts:


    Source: Google

    PHP, Python, and Perl are still somewhat immature in terms of their enterprise libraries, and their web services capabilities are still nascent. However, they have the necessary ingredients to meet the requirements of the next corporate computing phase of “text pump” applications. PHP, Python, and Perl are:

  • Well suited for loosely structured data like XML
  • Incredibly tuned at text processing
  • Very well suited for control flow programming
  • Very well tuned on Linux/x86 and Windows/x86
  • Very close to the metal, given their origins as Unix scripting languages
  • Tuned for 1-2 processor x86 machines

    In addition to being free and open source, these languages are easy to learn and use. The "P"s are primed to follow Linux and Apache and make huge inroads into the corporate market. The latest version of PHP is virtually indistinguishable from Java, to the point of almost identical syntax and keywords.

    A footnote regarding .NET - Microsoft has created Zen (previously named X#), an XML-native language for their common language runtime. Visual BASIC is the most popular scripting language in the world. And as we all know, Windows is very well tuned for 1-2 processor machines. So Microsoft will definitely be around, and there will probably be a troika of .NET, Java, and PHP/Python/Perl.