Why do some scripts start with #! ... ?

[an error occurred while processing this directive]

Why do some scripts start with #! ... ?

Chip Rosenthal has answered a closely related question in
comp.unix.xenix in the past.

I think what confuses people is that there exist two different
mechanisms, both spelled with the letter `#'. They both solve the
same problem over a very restricted set of cases -- but they are
none the less different.

Some background. When the UNIX kernel goes to run a program (one
of the exec() family of system calls), it takes a peek at the
first 16 bits of the file. Those 16 bits are called a `magic
number'. First, the magic number prevents the kernel from doing
something silly like trying to execute your customer database
file. If the kernel does not recognize the magic number then it
complains with an ENOEXEC error. It will execute the program only
if the magic number is recognizable.

Second, as time went on and different executable file formats were
introduced, the magic number not only told the kernel *if* it
could execute the file, but also *how* to execute the file. For
example, if you compile a program on an SCO XENIX/386 system and
carry the binary over to a SysV/386 UNIX system, the kernel will
recognize the magic number and say `Aha! This is an x.out
binary!' and configure itself to run with XENIX compatible system
calls.

Note that the kernel can only run binary executable images. So
how, you might ask, do scripts get run? After all, I can type
`my.script' at a shell prompt and I don't get an ENOEXEC error.
Script execution is done not by the kernel, but by the shell. The
code in the shell might look something like:

/* try to run the program */
execl(program, basename(program), (char *)0);

/* the exec failed -- maybe it is a shell script? */
if (errno == ENOEXEC)
execl ("/bin/sh", "sh", "-c", program, (char *)0);

/* oh no mr bill!! */
perror(program);
return -1;

(This example is highly simplified. There is a lot
more involved, but this illustrates the point I'm
trying to make.)

If execl() is successful in starting the program then the code
beyond the execl() is never executed. In this example, if we can
execl() the `program' then none of the stuff beyond it is run.
Instead the system is off running the binary `program'.

If, however, the first execl() failed then this hypothetical shell
looks at why it failed. If the execl() failed because `program'
was not recognized as a binary executable, then the shell tries to
run it as a shell script.

The Berkeley folks had a neat idea to extend how the kernel starts
up programs. They hacked the kernel to recognize the magic number
`#!'. (Magic numbers are 16-bits and two 8-bit characters makes
16 bits, right?) When the `#!' magic number was recognized, the
kernel would read in the rest of the line and treat it as a
command to run upon the contents of the file. With this hack you
could now do things like:

#! /bin/sh

#! /bin/csh

#! /bin/awk -F:

This hack has existed solely in the Berkeley world, and has
migrated to USG kernels as part of System V Release 4. Prior to
V.4, unless the vendor did some special value added, the kernel
does not have the capability of doing anything other than loading
and starting a binary executable image.

Now, lets rewind a few years, to the time when more and more folks
running USG based unices were saying `/bin/sh sucks as an
interactive user interface! I want csh!'. Several vendors did
some value added magic and put csh in their distribution, even
though csh was not a part of the USG UNIX distribution.

This, however, presented a problem. Let's say you switch your
login shell to /bin/csh. Let's further suppose that you are a
cretin and insist upon programming csh scripts. You'd certainly
want to be able to type `my.script' and get it run, even though it
is a csh script. Instead of pumping it through /bin/sh, you want
the script to be started by running:

execl ("/bin/csh", "csh", "-c", "my.script", (char *)0);

But what about all those existing scripts -- some of which are
part of the system distribution? If they started getting run by
csh then things would break. So you needed a way to run some
scripts through csh, and others through sh.

The solution introduced was to hack csh to take a look at the
first character of the script you are trying to run. If it was a
`#' then csh would try to run the script through /bin/csh,
otherwise it would run the script through /bin/sh. The example
code from the above might now look something like:

/* try to run the program */
execl(program, basename(program), (char *)0);

/* the exec failed -- maybe it is a shell script? */
if (errno == ENOEXEC && (fp = fopen(program, "r")) != NULL) {
i = getc(fp);
(void) fclose(fp);
if (i == '#')
execl ("/bin/csh", "csh", "-c", program, (char *)0);
else
execl ("/bin/sh", "sh", "-c", program, (char *)0);
}

/* oh no mr bill!! */
perror(program);
return -1;

Two important points. First, this is a `csh' hack. Nothing has
been changed in the kernel and nothing has been changed in the
other shells. If you try to execl() a script, whether or not it
begins with `#', you will still get an ENOEXEC failure. If you
try to run a script beginning with `#' from something other than
csh (e.g. /bin/sh), then it will be run by sh and not csh.

Second, the magic is that either the script begins with `#' or it
doesn't begin with `#'. What makes stuff like `:' and `: /bin/sh'
at the front of a script magic is the simple fact that they are
not `#'. Therefore, all of the following are identical at the
start of a script:

:

: /bin/sh

[an error occurred while processing this directive]