Issues in Large Scale Porting
This page contains slides from lectures given on porting at
various locations including the 1991 Data Processing Institute.
The presentation is composed of:
- Presentation's Objectives
- The Porting Problem
- The Universe of Discourse
- Specific Problems to be Resolved
- Porting Goals
- Strategy and Mechanisms
- Specific Problem Solving
- Dr. Tuna's Porting Nostrums
Present my approach to the porting problem w.r.t.:
- specific solutions
- long term software hygiene
The principle objective is to instil one adage:
Think before you port!
The applicability of my solution to your problem may
My solution is extensive
and geared to large software systems.
Through familiarity, I have bred the expected feelings
towards the suppliers,
but I will not air these feelings publicly,
as the suppliers do seem to strive for a uniform level of quality.
The Porting Problem
In the beginning (1975): one system, one machine,
one compiler, well defined tools, limited expectations,
everyone had source, small community
(Duke did distribution using rk05 dump tape).
At first split PWB vs. V7 (1978): virtually same API, minor
variations in Cc, still same hardware.
At diaspora: 4.1bsd, System III, 68k systems, xenix (1981):
large G.C.D., but frayed at edges (tty controller, header file
migration and style).
Real change in emphasis (accounting vs. tools).
Current: many unqualified suppliers, dubious Q.A., political
pushes, competing standards, commercial pollution,
closed systems, huge range of hardware, software, and quality.
One ports to:
to meet customer demands and increase market
to facilitate use of appropriate (i.e., cost or performance effective)
kit for task
to make use of available kit
to improve the quality (porting to and
testing on multiple platforms finds bugs)
The Universe of Discourse
- Not concerned with net.sources --
We are dealing with large scale (greater than 100k lines),
costly (real people paid to create and maintain it) software.
- Dealing with all possible reasonable target platforms --
4.bsd, OSF, Unix5., and various dichotomous hybrids.
Do not unconsciously exclude 16/24 bit words or word address machines.
- Do not assume nor demand conformance to any standard --
They are so often sub-standard.
One can look to the standards for hints, but don't
trust supplier's claims.
The more loudly announced a their adherence to a standard,
the more likely it is that their conformance is suspect.
Specific Problems to be Resolved
Problems with porting may be partitioned into
the following classifications.
- system architecture
-- byte order, word size, pointer size, word alignment, and so on
- the tool interfaces and/or semantics
Variations in performance, cost, robustness, reliability,
availability, support, and the honesty of supplier
should be considered, but are not relevant to this
Variations in Tools
may or may not exist (e.g., ranlib, cpp, troff, lint)
may have differing names (rsh vs. /usr/ucb/rsh vs.
rcmd vs. on)
may have differing flags (cc -gx vs. -g)
may or may not be required
(ranlib vs. tsort/lorder vs. NULL)
may or may not be accurately documented if at all
(far too many examples of this)
may have variations in the exit status interpretation
may have differing input syntax and semantics
(cc, SysV vs. 7th edition make)
may have widely varying internal formats and supported bug set
(ar is the mother of all examples)
may have truly rebarbative behaviours (leading `#' in sh scripts)
differing permissions (can one chown?)
Variations in Libraries
variations in semantics (e.g., fopen(...,"a+"))
differing names (index vs. strchr)
differing argument types and semantics (wait)
differing return types (sprintf, signal)
routines not provided (rename, gethostname,
all of the above (getwd vs. getcwd)
non-working routines (rename)
name of library (libtermcap vs. libcurses)
use of ranlib
compiled under different universes
Variations in Header Files
name of appropriate header
(fcntl.h vs. file.h, ioctl.h vs.
ordering of header (time.h vs. sys/time.h)
differing structs (st_blocks in stat.h,
direct.h vs. dir.h)
missing or declared types (ulong vs. u_long,
same type defined differently (stdio.h's FILE)
some header files uncompilable
Number of maximum open file descriptors
Number of normally open file descriptors (4 on eighth edition)
Number of groups
Name of logged in user ($USER vs. $LOGNAME vs.
Maximum leaf name (not as you thought)
crypt & password file interface access and performance
Length of tty name prefix stripped before inserted into utmp file
Full ANSI? Partial ANSI? __STDC__ defined?
and if so, is in too be trusted?
const supported? ANSI token pasting?
Are prototypes supported? Trustworthy? Working? Required?
Proper enum support?
Type of sizeof?
Type of difference between two pointers?
Is char signed or unsigned by default?
void* meaningful? Working?
Any required defines (e.g., -DPORTAR, -DXENIX )?
Any special flags (e.g., -X28 for m88k)?
What are standard include and library search paths?
Any limitations (e.g., maximum number of -I flags)?
Any known bugs?
Usable with provided libraries and header files?
Two Splendid Examples
Seven variations of rename(2):
Existence proof that it can be done
Provided but is no-op, but documentation
is Posix compliant, but the example contains a bug.
rename(from,to) fails when from file is busy
Times out far too frequently
Parallel renames crash kernel
Parallel renames corrupt file system
So you want to use struct tm and/or struct timeval!
You might have to:
include <sys/time.h> but not <time.h> --
latter is included in former but is not idempotent.
include <time.h> but not <sys/time.h> --
latter is included in former but is not idempotent.
include <sys/time.h> & <time.h> --
both required and neither includes other.
suppress use of struct timeval
because it's not supported.
precede either inclusion by include
of <sys/types.h>, but be careful,
because it's not always idempotent.
add your own typedef of time_t because it may not be provided
on some systems but it must be used.
Repeat after me ...
``There's no such thing
as portable code!
There's only code that's
I had code that had been successfully ``ported'' to
thirty platforms, yet failed on the thirty-first.
The only achievable goal is:
code that can be quickly and easily adapted to work
on a new target platform.
How does one make code adaptable?
builds the system on a new target
-- which should reveal the discrepancies
between the previous and new targets;
evolves UNIVERSAL solutions and strategies for dealing with
such discrepancies as and before they arise;
incorporates those solutions into the code; and
What one is trying to do
-- through experimentation, experience, and folklore,
-- is to:
position the code such that its adaptation
to the next target is just the simple application of previously
incorporated mechanisms to adapt to the discrepancies manifested
by that target.
Direct your efforts towards:
- all targets:
not just the one you are currently doing!
- all sources:
not just that one file that's currently presenting problems!
In other words, once you've have discovered a problem:
Solve it once,
and only once,
for all time.
but you must ensure that the solution is
conveyed to any developers who might invoke
the discrepancy in the future.
When changing code to adapt to a new target ...
Do not break any previous
Strategy and Mechanisms
The following sections describe
the major strategies and mechanisms I use.
Adopt them as is feasible.
All products, for all platforms, for all configurations,
for all concerned parties (e.g., developers, Q.A., release
engineers) are built from a:
such as [Korn 89], [Tilbrook 90a], [Glew 89], build(1).
Comprehensive Incremental Construction
The software construction system should provide a
comprehensive approach to incremental construction
that ensures that any modification causes
the appropriate constructions to be applied.
This should incorporate full dynamic transitive closure
dependency tracking (e.g., mkdepends is a half-hearted attempt
at this, but a good start).
If you are forced to use make, know ye well
its many limitations.
A Compatibility Library
Add your own compatibility library as the penultimate library
(i.e., immediately before libc) for every program.
This library should contain any subroutine mappings that
are required to compensate for libc deficiencies.
In some instances, it must come between two libcs
(one from each universe).
A Environment Header File
Insert at the beginning of every C file,
(be it source or generated) an include of one of your own
All our sources contain as the first C statement:
This header file contains, in part:
defines used to suppress or select code based on
platform or operating system types (e.g., SY_U53,
commonly used types (e.g., Bool_t,
Schar_t for signed character);
defines for Prototype declarations;
defines for TRUE and FALSE,
Boolean manifests for type of token pasting and other
Foreign Header File Wrappers
Never include a supplier's header file directly.
Always wrap in one of your own idempotent header files,
# define ENVIR_STDIO_H
# undef NULL
# include <stdio.h>
# undef NULL
# define NULL (0)
/* missing prototypes */
#endif /* ENVIR_STDIO_H */
This allows one to correct their mistakes and omissions,
and deal with discrepancies (e.g., type of sprintf)
in single location.
In some cases, provide capability based name for header
to deal with discrepancies.
For example create <envir/open.h> to include
header that contains open(2) arguments or
define them if not provided.
A Single Parameterization File
Build a mechanism to construct and use
a single platform parameterization file,
that is a file that provides all platform
specific settings or options for the software
(e.g., appropriate type for a signed character,
the include file that contains the open(2)'s second argument
My parameterization file contains 112 settings
which provide all required sub-routine and header file mappings,
all site specific information (e.g., address, telephone),
a variety of system specific constants and booleans
(e.g., ANSI type token pasting, supports Prototypes).
A procedure (e.g., strfix at our site)
is used to insert parameter values
in specific configured files
(e.g., <envir/system.h> and <envir/stdio.h>)
which are then installed if the resulting file differs
from the currently installed file.
A Single Configuration File
A single file is used to specify all construction
specific information (i.e., destination directory, options,
The construction system ensures that these values are
applied universally and any change in their settings
will result in the reapplication of any tool that
uses them (again difficult to do with make).
Apply Stenning's principles of Project Hygiene.
Know what you are trying to accomplish!
Focus on the process as a whole,
rather than of the final product.
See [Stenning 90] and [Tilbrook 90b].
Specific Problem Solving
The following are some well known common
discrepancies plus a short description of
a possible solution.
Create your own readdir.h that either includes
appropriate header (if there is one) and defines common
struct to be used on all systems.
Provide macros to deal with missing name length in
Posix definition. Create simulation for opendir,
readdir, etc. for all seventh edition file systems.
Lose theirs. Create your own superset of all the versions
you can find, incorporating appropriate macros
or mappings to your routines for memcmp, bzero,
strchr vs. index, etc.
Create your own generalized capability-based interface library
that hides differences between terminfo and termcap.
If you have getwd, use it.
If you have getcwd() create getwd interface
that calls getcwd to do the interesting stuff.
Otherwise create getwd() that invokes pwd(1)
and reads in its output.
Create your own header file with capability-based
macros to provide basic functionality (e.g., stty,
gtty, set or reset mode).
Is tricky but can be done.
Interaction with signals is a challenge.
- Ar files
Create single routine that retrieves generalized structure
describing archive members.
Unfortunately has to be tailored for nearly every system
individually, but you only do it once.
This is another paper.
Too bad. It's an important porting aid.
ANSI C type prototype declarations should exist
for every routine that you use or provide.
We have a boolean parameter that specifies whether
or not prototypes should be used
(__STDC__ not to be trusted).
<envir/system.h> contains (in effect):
# define Prototype(x) x
# define Prototype(x) ()
Procedures are then declared using something similar to:
int func Prototype((char *nm,int cnt));
It works and is extremely useful both as documentation
and to validate routine usage.
Some warnings regarding use of prototypes:
Dr. Tuna's Porting Nostrums
A list of unexplained rules:
Have a testing strategy to check your port,
before you start porting
Ensure system to be ported actually works on
its current host as built with the source
you plan to port!
That is don't try to port and debug a system simultaneously.
Avoid ifdefs if possible, and if not,
limit their use to header files.
Do not suppress code compilation unless necessary to link
Conditionally excluded code is frequently uncompilable.
At convenient closures, rebuild system on all machines,
not just the object of the current porting exercise.
When possible, test required changes on already supported
Avoid simultaneous introductions of dramatic changes.
When starting major surgery, start with a working system.
Always build the full system (time permitting).
You never really know the entire scope of your changes.
At convenient closures, remove all remnants of the system
and totally rebuild, particularly when you think you're
finished -- you aren't.
Use lint and like it.
-D to be considered dangerous.
Avoid compiler or loader dependent tricks.
Resist the urge to fix the suppliers' bugs.
Your target should be as close to your clients' as
Create machine independent varargs interface
and convert all appropriate routines to use it.
Encouragement to use a version system should be unnecessary.
Actually, not so much a conclusion as credentials.
Can you believe me, or am I just another consultant
who is regurgitating other people's opinions
without truly understanding them?
The strategy described is in use at Sietec O.S.D.
It has been applied to our three major products,
which consists of about 390 directories containing 4,500 source files,
which themselves contain approximately
eight hundred thousand lines of code
and about 21 Megabytes.
The product directories contain about twenty-five hundred files.
By Feldman's metrics, this is a large system.
Yet, there is one source file system and we maintain
up to date product trees on nine different targets
The only difference between two different configurations
will be the platform parameterization file
and the configuration control file.
If adaptation to a new platform takes more than a day,
(and it takes upwards of eight hours to do the compiles)
it's usually due to bugs in the target system's environment.
- [Stenning 90]
Vic Stenning, ``Project Hygiene'', EurOpen Proceedings,
Nice (Oct. '90).
- [Tilbrook 90]
David Tilbrook, ``Quod Erat Faciendum'',
EUUG & SUUG Conference Distributions, Nice & Moscow (Oct. '90),
- [Tilbrook 90b]
David Tilbrook, ``Washing Behind Your Ears: The Principles
of Software Hygiene'', EurOpen Proceedings,
Nice (Oct. '90).
- [Glew 89]
Andy Glew, ``Boxes, Links, and Parallel Trees:
Elements of a Configuration Management System'',
Software Management Workshop Proceedings,
New Orleans (Apr. '89).
- [Feldman 90]
Stuart Feldman, ``Large Scale Software Development Under Unix'',
UKUUG Proceedings, London (June '90).
- [Korn 89]
David Korn, ``The 3D File System'',
Usenix Proceedings, Baltimore (June '89).
|porting.qh - 1.14 - 03/10/24