Monday, September 01, 2003

The Next Language

After almost 9 years of programming in Java, I have been thinking about where Java is going and how it fits into the continuum of programming languages in the enterprise.

Evolution of Corporate Systems and Languages

Java falls into the category of the corporate language – a computer language and system used by corporations to run their business. To understand how corporate languages evolve, it is important to match them to evolution of corporate computing platforms. Corporate platforms evolved from mainframes to minicomputers to client/server to Internet and now to Grid (parallel Linux white boxes). In each transition, a dominant player emerged, as shown in the following chart:

As corporate computing platforms shifted, the corporate language of choice had a tendency to shift as well. The hey-day was during the client/server era, when there were numerous popular languages, including Visual BASIC, Delphi (a Pascal derivative), PowerSoft’s PowerScript, and others. These languages were all essentially somewhat-typed, pseudo-interpreted languages. And they were all replaced with Java, a strongly-typed, pseudo-interpreted language and .Net, a somewhat-typed, pseudo-interpreted language.

During the Internet age, corporations ran a variety of server OS’s in the middle tier, including Solaris, AIX, HP-UX, Irix, and Windows NT. In many corporations, it was a strong requirement that applications be portable across two or more of those platforms in order to prevent vendor lock-in. Once a company ran applications that would only run on one of those platforms, the vendor would gouge that company in the next upgrade cycle since the company had lost much of its leverage.

Although Java was originally designed to run on first set-top clients and then PC clients, the language and its runtime were portable and clearly met this challenge. Some companies had already started running Java on the server using servers from NetDynamics and KIVA. In addition, Java offered some of the benefits of the languages that corporate developers enjoyed in their client/server days, such as garbage collection and higher-level API’s to operating system features that abstracted complexity.

Java soon also offered a critical mass of vendors supporting the platform – everything under the sun (pun intended) soon had a Java API, including Oracle, SAP, Tibco, CICS, MQSeries, etc. Over a couple of years, these were all accessible via standardized API’s that grew up to become J2EE, and J2EE grew up to dominate the corporate computing environment.

What Java didn’t provide was 4GL type tools, but then again nobody had 4GL type tools for web applications, so it was no big deal. It was expected that those would come. However, many years have past, and the vast majority of J2EE applications are still built by hand. A lesson that Microsoft has learned well is that for API’s to be toolable, they need to be developed concurrently with the tool and both the API and tool should depend on easily externalizable metadata. Java API’s were always written on the merits of the API’s themselves, and subsequent tools were predominantly code generators shunned by programmers.

The Java API’s grew into a morass of inconsistent and incomprehensible API’s, even the most simple things proved to be very complicated. The vast majority of J2EE deployments (over 80% according to Gartner) are simply Servlet/JSP to JDBC applications. Basically HTML front-ends to relational databases. It is ironic that much of what makes Java complicated today is all of its numerous band-aid extensions, such as generics and JSP templates, which were added to make these types of simple applications easier to develop.

Regardless of these issues, Java and J2EE are completely dominant in the corporate computing realm.

The Grid: Linux and Web Services

The industry is currently in a subtle paradigm shift away from larger SMP boxes running proprietary Unixes to large grids of 1-2 processor machines running Linux. These machines already dominate the front tier web server market, and are starting to appear on the back-end with new products like Oracle RAC, the grid-enabled version of Oracle. This transition will also start to affect the middle tier, but is held back by the fact that the popular J2EE implementations were built to run on small clusters of multi-processor machines rather than large clusters of uni-processor machines.

There is no longer a requirement for portability, since customers no longer feel locked in by a vendor when they are running Linux on a x86 white box. Customers are therefore are comfortable running applications that only run on Linux/x86. In many corporations there is still and there will remain a requirement to be able to develop corporate applications on Windows machines, so the only portability requirement is the ability to develop on Windows and deploy on Linux.

Today’s corporate applications basically all produce text, whether HTML for a web browser or XML for another application. With the coming onslaught of web services, pretty soon all of the back-end resources will be providing XML rather than binary data. The average corporate application will be a big text pump, taking in XML from a back-end resource, transforming it somewhat, and producing either HTML or XML.

So let’s look at the requirements for today’s corporate applications:

  • Handle XML (dynamic data with fluctuating types) well
  • Quickly process text into objects and out of objects
  • Most apps have limited logic consisting mainly of control flow
  • No need for portability beyond Linux/x86 and Windows/x86
  • Very thin veneer over the operating system for system services
  • Tuned for 1-2 processor x86 machines

    Given these requirements, Java does not fare very well:

  • XML data is inherently unstructured and it has to be shoehorned into and out of Java, which is a strongly typed language that does not like new types of objects popping into its applications.
  • Java is horrific at processing text since it can’t manipulate strings directly.
  • While Java is great for complicated applications, it is not ideally suited for specifying control flow.
  • Java is a magically portable platform, but there is no longer a requirement for portability other than Linux and Windows.
  • Since there is no longer a portability requirement, developers want only a very thin veneer over operating system services like sockets, while Java provides a huge virtual machine in between the application and the operating system.
  • Most J2EE implementations are tuned for 4-16 processor SMP boxes
  • So if Java does not meet these requirements, what does? Apparently what is needed is a language/environment that is loosely typed in order to encapsulate XML well and that can efficiently process text. It should be very well suited for specifying control flow. And it should be a thin veneer over the operating system.

    Most Linux distribution in fact bundle three such languages, PHP, Python, and Perl. PHP is by far the most popular, Python is considered the most elegant (if not odd), and Perl the tried-and-true workhorse. All three languages are open source and free. As the following graphs show, PHP use has skyrocketed over the past few years:

    Source: Netcraft

    PHP is now three times as popular as Java JSP in terms of URL counts:

    Source: Google

    PHP, Python, and Perl are still somewhat immature in terms of their enterprise libraries, and their web services capabilities are still nascent. However, they have the necessary ingredients to meet the requirements of the next corporate computing phase of “text pump” applications. PHP, Python, and Perl are:

  • Well suited for loosely structured data like XML
  • Incredibly tuned at text processing
  • Very well suited for control flow programming
  • Very well tuned on Linux/x86 and Windows/x86
  • Very close to the metal, given their origins as Unix scripting languages
  • Tuned for 1-2 processor x86 machines

    In addition to being free and open source, these languages are easy to learn and use. The "P"s are primed to follow Linux and Apache and make huge inroads into the corporate market. The latest version of PHP is virtually indistinguishable from Java, to the point of almost identical syntax and keywords.

    A footnote regarding .NET - Microsoft has created Zen (previously named X#), an XML-native language for their common language runtime. Visual BASIC is the most popular scripting language in the world. And as we all know, Windows is very well tuned for 1-2 processor machines. So Microsoft will definitely be around, and there will probably be a troika of .NET, Java, and PHP/Python/Perl.