Issues in Large Scale Porting
This page contains slides from lectures given on porting at various locations including the 1991 Data Processing Institute.

An Quick Introduction to Software Construction and QEF

Technical Documentation, Overviews, Tutorials, and Papers

Introduction

The presentation is composed of:

Introduction
Presentation's Objectives
The Porting Problem
The Universe of Discourse
Specific Problems to be Resolved
Porting Goals
Strategy and Mechanisms
Specific Problem Solving
Dr. Tuna's Porting Nostrums
Conclusions

Presentation's Objectives

Present my approach to the porting problem w.r.t.:

philosophy
strategy
specific solutions
long term software hygiene

The principle objective is to instil one adage:

Think before you port!

Caveats

The applicability of my solution to your problem may be difficult!

My solution is extensive and geared to large software systems.

Through familiarity, I have bred the expected feelings towards the suppliers, but I will not air these feelings publicly, as the suppliers do seem to strive for a uniform level of quality.

The Porting Problem

Historically:

In the beginning (1975): one system, one machine, one compiler, well defined tools, limited expectations, everyone had source, small community (Duke did distribution using rk05 dump tape).
At first split PWB vs. V7 (1978): virtually same API, minor variations in Cc, still same hardware.
At diaspora: 4.1bsd, System III, 68k systems, xenix (1981): large G.C.D., but frayed at edges (tty controller, header file migration and style). Real change in emphasis (accounting vs. tools).
Current: many unqualified suppliers, dubious Q.A., political pushes, competing standards, commercial pollution, closed systems, huge range of hardware, software, and quality.

Why Port?

One ports to:

to meet customer demands and increase market
to facilitate use of appropriate (i.e., cost or performance effective) kit for task
to make use of available kit
to improve the quality (porting to and testing on multiple platforms finds bugs)
...

The Universe of Discourse

Not concerned with net.sources -- We are dealing with large scale (greater than 100k lines), costly (real people paid to create and maintain it) software.
Dealing with all possible reasonable target platforms -- Assume 4.[234]bsd, OSF, Unix5.[01234], and various dichotomous hybrids. Do not unconsciously exclude 16/24 bit words or word address machines.
Do not assume nor demand conformance to any standard -- They are so often sub-standard. One can look to the standards for hints, but don't trust supplier's claims. The more loudly announced a their adherence to a standard, the more likely it is that their conformance is suspect.

Specific Problems to be Resolved

Problems with porting may be partitioned into the following classifications. Variations in:

system architecture -- byte order, word size, pointer size, word alignment, and so on
the tool interfaces and/or semantics
libraries
headers
environment

Variations in performance, cost, robustness, reliability, availability, support, and the honesty of supplier should be considered, but are not relevant to this discussion.

Variations in Tools

may or may not exist (e.g., ranlib, cpp, troff, lint)
may have differing names (rsh vs. /usr/ucb/rsh vs. rcmd vs. on)
may have differing flags (cc -gx vs. -g)
may or may not be required (ranlib vs. tsort/lorder vs. NULL)
may or may not be accurately documented if at all (far too many examples of this)
may have variations in the exit status interpretation
may have differing input syntax and semantics (cc, SysV vs. 7th edition make)
may have widely varying internal formats and supported bug set (ar is the mother of all examples)
may have truly rebarbative behaviours (leading `#' in sh scripts)
differing permissions (can one chown?)

Variations in Libraries

variations in semantics (e.g., fopen(...,"a+"))
differing names (index vs. strchr)
differing argument types and semantics (wait)
differing return types (sprintf, signal)
routines not provided (rename, gethostname, wait3, dup2)
all of the above (getwd vs. getcwd)
non-working routines (rename)
name of library (libtermcap vs. libcurses)
use of ranlib
compiled under different universes
...

Variations in Header Files

name of appropriate header (fcntl.h vs. file.h, ioctl.h vs. termio.h, ...)
ordering of header (time.h vs. sys/time.h)
differing structs (st_blocks in stat.h, direct.h vs. dir.h)
missing or declared types (ulong vs. u_long, uid_t, gid_t)
same type defined differently (stdio.h's FILE)
some header files uncompilable
...

Environment

Number of maximum open file descriptors
Number of normally open file descriptors (4 on eighth edition)
Number of groups
Name of logged in user ($USER vs. $LOGNAME vs. getlogin())
Maximum leaf name (not as you thought)
crypt & password file interface access and performance
Length of tty name prefix stripped before inserted into utmp file

Cc issues

Full ANSI? Partial ANSI? __STDC__ defined? and if so, is in too be trusted? signed and/or const supported? ANSI token pasting? Are prototypes supported? Trustworthy? Working? Required?
Proper enum support?
Type of sizeof? Type of difference between two pointers?
Is char signed or unsigned by default?
void* meaningful? Working?
Any required defines (e.g., -DPORTAR, -DXENIX )?
Any special flags (e.g., -X28 for m88k)?
What are standard include and library search paths?
Any limitations (e.g., maximum number of -I flags)?
Any known bugs?
Usable with provided libraries and header files?

Two Splendid Examples

Seven variations of rename(2):

Existence proof that it can be done
Not provided
Provided but is no-op, but documentation is Posix compliant, but the example contains a bug.
rename(from,to) fails when from file is busy
Times out far too frequently
Parallel renames crash kernel
Parallel renames corrupt file system

So you want to use struct tm and/or struct timeval!

You might have to:

include <sys/time.h> but not <time.h> -- latter is included in former but is not idempotent.
include <time.h> but not <sys/time.h> -- latter is included in former but is not idempotent.
include <sys/time.h> & <time.h> -- both required and neither includes other.
suppress use of struct timeval because it's not supported.
precede either inclusion by include of <sys/types.h>, but be careful, because it's not always idempotent.
add your own typedef of time_t because it may not be provided on some systems but it must be used.

Porting Goals

Repeat after me ...

``There's no such thing as portable code!
There's only code that's been ported!''

I had code that had been successfully ``ported'' to thirty platforms, yet failed on the thirty-first.

The only achievable goal is:

Adaptable Code

that is, code that can be quickly and easily adapted to work on a new target platform.

So ...

How does one make code adaptable?

One:

builds the system on a new target -- which should reveal the discrepancies between the previous and new targets;
evolves UNIVERSAL solutions and strategies for dealing with such discrepancies as and before they arise;
incorporates those solutions into the code; and
iterates!

What one is trying to do -- through experimentation, experience, and folklore, -- is to:

position the code such that its adaptation to the next target is just the simple application of previously incorporated mechanisms to adapt to the discrepancies manifested by that target.

Nota Bene

Direct your efforts towards:

all targets:: not just the one you are currently doing!
all sources:: not just that one file that's currently presenting problems!

In other words, once you've have discovered a problem:


		Solve it once,
		and only once,
		for all time.

but you must ensure that the solution is conveyed to any developers who might invoke the discrepancy in the future.

Caution

When changing code to adapt to a new target ...

Do not break any previous adaptation!!!

Strategy and Mechanisms

The following sections describe the major strategies and mechanisms I use.

Adopt them as is feasible.

Single Sourcing

All products, for all platforms, for all configurations, for all concerned parties (e.g., developers, Q.A., release engineers) are built from a:


Single
Universal
Shared
Source
File
System

such as [Korn 89], [Tilbrook 90a], [Glew 89], build(1).

Comprehensive Incremental Construction

The software construction system should provide a comprehensive approach to incremental construction that ensures that any modification causes the appropriate constructions to be applied.

This should incorporate full dynamic transitive closure dependency tracking (e.g., mkdepends is a half-hearted attempt at this, but a good start).

If you are forced to use make, know ye well its many limitations.

A Compatibility Library

Add your own compatibility library as the penultimate library (i.e., immediately before libc) for every program.

This library should contain any subroutine mappings that are required to compensate for libc deficiencies.

In some instances, it must come between two libcs (one from each universe).

A Environment Header File

Insert at the beginning of every C file, (be it source or generated) an include of one of your own header files.

All our sources contain as the first C statement:


#include <envir/system.h>

This header file contains, in part:

defines used to suppress or select code based on platform or operating system types (e.g., SY_U53, SY_B43, MIPS_ENV)
commonly used types (e.g., Bool_t, Schar_t for signed character);
defines for Prototype declarations;
defines for TRUE and FALSE,
Boolean manifests for type of token pasting and other compiler settings.

Foreign Header File Wrappers

Never include a supplier's header file directly. Always wrap in one of your own idempotent header files, as in:


#ifndef ENVIR_STDIO_H
#	define ENVIR_STDIO_H


#	undef	NULL
#	include <stdio.h>
#	undef NULL
#	define NULL	(0)
/* missing prototypes */
#endif	/* ENVIR_STDIO_H */

This allows one to correct their mistakes and omissions, and deal with discrepancies (e.g., type of sprintf) in single location.

In some cases, provide capability based name for header to deal with discrepancies. For example create <envir/open.h> to include header that contains open(2) arguments or define them if not provided.

A Single Parameterization File

Build a mechanism to construct and use a single platform parameterization file, that is a file that provides all platform specific settings or options for the software (e.g., appropriate type for a signed character, the include file that contains the open(2)'s second argument manifests).

F.Y.I.:: My parameterization file contains 112 settings which provide all required sub-routine and header file mappings, all site specific information (e.g., address, telephone), a variety of system specific constants and booleans (e.g., ANSI type token pasting, supports Prototypes).

A procedure (e.g., strfix at our site) is used to insert parameter values in specific configured files (e.g., <envir/system.h> and <envir/stdio.h>) which are then installed if the resulting file differs from the currently installed file.

A Single Configuration File

A single file is used to specify all construction specific information (i.e., destination directory, options, cc flags).

The construction system ensures that these values are applied universally and any change in their settings will result in the reapplication of any tool that uses them (again difficult to do with make).

Project/Software Hygiene

Apply Stenning's principles of Project Hygiene. In particular:

Know what you are trying to accomplish!
Focus on the process as a whole, rather than of the final product.

See [Stenning 90] and [Tilbrook 90b].

Specific Problem Solving

The following are some well known common discrepancies plus a short description of a possible solution.

Readdir: Create your own readdir.h that either includes appropriate header (if there is one) and defines common struct to be used on all systems. Provide macros to deal with missing name length in Posix definition. Create simulation for opendir, readdir, etc. for all seventh edition file systems.
string.h: Lose theirs. Create your own superset of all the versions you can find, incorporating appropriate macros or mappings to your routines for memcmp, bzero, strchr vs. index, etc.
Termcap: Create your own generalized capability-based interface library that hides differences between terminfo and termcap.

Getcwd: If you have getwd, use it. If you have getcwd() create getwd interface that calls getcwd to do the interesting stuff. Otherwise create getwd() that invokes pwd(1) and reads in its output.
Termio: Create your own header file with capability-based macros to provide basic functionality (e.g., stty, gtty, set or reset mode). Is tricky but can be done. Interaction with signals is a challenge.
Ar files: Create single routine that retrieves generalized structure describing archive members. Unfortunately has to be tailored for nearly every system individually, but you only do it once.
Linting: This is another paper. Too bad. It's an important porting aid.

Prototypes

ANSI C type prototype declarations should exist for every routine that you use or provide.

We have a boolean parameter that specifies whether or not prototypes should be used (__STDC__ not to be trusted).

<envir/system.h> contains (in effect):


#if PROTOTYPES_SUPPORTED
#	define  Prototype(x)  x
#else
#	define  Prototype(x)  ()
#endif

Procedures are then declared using something similar to:


int func Prototype((char *nm,int cnt));

It works and is extremely useful both as documentation and to validate routine usage.

Prototype Warnings

Some warnings regarding use of prototypes:

Some compilers may support prototypes but lay down bad code if you use them. If a program dumps core at its first procedure call, turn off the prototypes and try again.
Some compilers misapply promoted types. Avoid using chars or shorts as arguments. This upsets the high priests of orthodoxy, but less than a customer looking at:
```
Memory fault: core dumped
```
Some compilers support standard prototypes, but dump core if a function pointer is prototyped. We actually use a different boolean within system.h to specify a PtrProto() macro.
Some compilers report discrepancies between a declaration and a use, but not between the declaration and the formal definition.

Dr. Tuna's Porting Nostrums

A list of unexplained rules:

Have a testing strategy to check your port, before you start porting
Ensure system to be ported actually works on its current host as built with the source you plan to port! That is don't try to port and debug a system simultaneously.
Avoid ifdefs if possible, and if not, limit their use to header files.
Do not suppress code compilation unless necessary to link properly. Conditionally excluded code is frequently uncompilable.
At convenient closures, rebuild system on all machines, not just the object of the current porting exercise.
When possible, test required changes on already supported platform first.
Avoid simultaneous introductions of dramatic changes. When starting major surgery, start with a working system.
Always build the full system (time permitting). You never really know the entire scope of your changes.
At convenient closures, remove all remnants of the system and totally rebuild, particularly when you think you're finished -- you aren't.
Use lint and like it.
-D to be considered dangerous.
Avoid compiler or loader dependent tricks.
Resist the urge to fix the suppliers' bugs. Your target should be as close to your clients' as possible.
Create machine independent varargs interface and convert all appropriate routines to use it.
Encouragement to use a version system should be unnecessary.

Conclusions

Actually, not so much a conclusion as credentials.

Can you believe me, or am I just another consultant who is regurgitating other people's opinions without truly understanding them?

The strategy described is in use at Sietec O.S.D.

It has been applied to our three major products, which consists of about 390 directories containing 4,500 source files, which themselves contain approximately eight hundred thousand lines of code and about 21 Megabytes. The product directories contain about twenty-five hundred files. By Feldman's metrics, this is a large system.

Yet, there is one source file system and we maintain up to date product trees on nine different targets simultaneously. The only difference between two different configurations will be the platform parameterization file and the configuration control file.

If adaptation to a new platform takes more than a day, (and it takes upwards of eight hours to do the compiles) it's usually due to bugs in the target system's environment.

Bibliography

[Stenning 90]: Vic Stenning, ``Project Hygiene'', EurOpen Proceedings, Nice (Oct. '90).
[Tilbrook 90]: David Tilbrook, ``Quod Erat Faciendum'', EUUG & SUUG Conference Distributions, Nice & Moscow (Oct. '90),
[Tilbrook 90b]: David Tilbrook, ``Washing Behind Your Ears: The Principles of Software Hygiene'', EurOpen Proceedings, Nice (Oct. '90).
[Glew 89]: Andy Glew, ``Boxes, Links, and Parallel Trees: Elements of a Configuration Management System'', Software Management Workshop Proceedings, New Orleans (Apr. '89).
[Feldman 90]: Stuart Feldman, ``Large Scale Software Development Under Unix'', UKUUG Proceedings, London (June '90).
[Korn 89]: David Korn, ``The 3D File System'', Usenix Proceedings, Baltimore (June '89).

porting.qh - 1.14 - 03/10/24