July, 1999

Serving 3000 Web Pages with Apache/iX

Get yourself configured with the new Web server —
and let your 3000 do as much as other systems on intranets

By Andreas Schmidt

The intent of this article is to share some experiences and ideas I realized on my intranet projects served from an HP 3000 using Apache/iX. HP will be supporting this Web server starting with MPE/iX 6.0 Express 2 this fall, but it is available from HP’s Jazz Web server today.

On these pages I will:

Explain a little bit about the three main configuration files of Apache/iX,

Introduce Server Side Includes (SSI),

Show some special effects you can accomplish using SSI directives and variables,

Get Apache/iX server statistics and configuration,

Show how to secure areas within your Web projects,

Explain how to access files outside of APACHE account, and

Give some hints for Web publishing to an Apache/iX server.

In my examples, Apache/iX is installed as delivered, which basically means Server Side Includes are not enabled, and no document directory is secured.

You may already have heard about SSI in Netscape FastTrack Server, which is bundled in HP-UX nowadays. It’s called “server-parsed HTML.” It took me awhile before I found this in the Web based configuration interface there.

The configuration files

The following three files in the /APACHE/PUB/conf/ directory are important to configure the Apache/iX server: httpd.conf, srm.conf and access.conf. For each, I will explain a little bit and show some special options and effects for your Web project, especially using SSI.

For further information about these files, refer to www.apache.org or literature about the NCSA-based server. I used a German translation of the book “Managing Internet Information Service,” written by Russ Jones.

httpd.conf is the main configuration file of the server. It controls the service but not the details of the single files and areas of your Web projects. It is mainly used to define the user and group of the server, the e-mail ID of the Webmaster, the location of the server binaries, the log files, and more. You can use the default entries — ServerType: standalone; Port: 80; User: MGR.APACHE; Group: APACHE

ServerAdmin is the mail ID where the Webmaster is reachable. This information will be inserted into all server-generated pages in a problem case. (The mail program of your browser pops up. You can send e-mail only if your browser is configured to do so. You may also enable your Apache/iX server for Sendmail/iX, and use the same server as the e-mail server of your browser.)

ServerRoot: /APACHE/PUB: Informs the server where to start to load the program httpd and the configuration in the directory conf/, relative to the ServerRoot directory.

Log files ErrorLog logs/error_log and TransferLog logs/transfer_log are stored in the directory named here. From time to time you should check these files in /APACHE/PUB/logs/ for size and problems.

ServerName is your_apache_server. In most of the cases this is the same as the CPU name, but you can specify another name.

Note: I didn’t test starting Apache/iX out of inetd. (If you make this choice, the ServerType must be changed to inetd, and Port, User, and Group are ignored. This is configured in the inetd configuration files.)

srm.conf configures the resources of the server. Here you have to define where the server will find the documents and scripts. The main keywords are:

DocumentRoot /APACHE/PUB/htdocs/: This is the absolute path name of the place where the documents will be stored. Other directories may be referenced via Alias or by links in this directory.

DirectoryIndex index.html: Default name of the document, which is shown if only the name of a directory is browsed to.

Alias /icons/ /APACHE/PUB/icons/: Place to look if only /icons/ is used in a link.

ScriptAlias /cgi-bin/ /APACHE/PUB/cgi-bin/: This is where all files with a prefix /cgi-bin/ have been stored.

AddHandler server-parsed .shtml: This is important to activate SSI. Having a default entry other than SSI will only work for pages having the suffix .shtml. I changed this to AddHandler server-parsed .html to enable the Server Side Includes for all pages. We’ll see in the Server Side Includes section that follows what implications this may have.

AccessFileName .htaccess: This is the default where the access information of a document directory is stored in.

/APACHE/PUB/conf/access.conf is the global access control file (ACF). It defines how browser clients may access the whole Web server or dedicated directories. The default entry is
<Directory name_of_DocumentRoot> (so in my example, it would be:
<Directory /APACHE/PUB/htdocs>).

I recommend removing Indexes for script directories, and AllowOverride should be changed to None, so that no other option may override an existing .htaccess security definition.

[back]

The Server Side Includes (SSIs)

To enable the Server Side Includes, the <Directory> entry for documents in access.conf must be changed to Options Includes ExecCGI Indexes FollowSymLinks.

Together with the entry in srm.conf, AddHandler server-parsed suffix, these documents are now parsed for one of the following SSI directives:

config: modifies various aspects of SSI
echo: inserts value of CGI or SSI environment variables
exec: executes external programs and inserts output in current document
include: inserts text of document into current file
fsize: inserts the size of a specified file
flastmod: inserts last modification date and time for a specified file.

You can enable the execution of scripts while loading a document using
<!—#exec cmd=”script_name”—>, or include another file’s content using
<!-#include file=”file_name”—>.

This is a great feature, but it has its disadvantages: It can be quite costly for a server to continually parse documents before sending them to the client. It may create a security risk. But if used cautiously, it can be a very powerful tool.

Besides the CGI environment variables like SERVER_NAME, QUERY_STRING, REMOTE_HOST and some others which can be used from every CGI script, there are additional SSI environment variables:

DOCUMENT_NAME
DOCUMENT_URI
QUERY_STRING_UNESCAPED
DATE_LOCAL
DATE_GMT
LAST_MODIFIED

They can easily be inserted in a document using
<!-#echo var=”SSI_variable”->.

To get an overview of all CGI variables, you may write a little script to execute the env command in the Posix shell, as shown below:

#!/bin/sh
echo “Content-type: text/plain\n”
env


The output will look like this:

REMOTE_PORT=2969
GATEWAY_INTERFACE=CGI/1.1
DOCUMENT_ROOT=/APACHE/PUB/htdocs
HTTP_ACCEPT_LANGUAGE=en-us
SCRIPT_NAME=/cgi-bin/xenv
SCRIPT_FILENAME=/APACHE/PUB/cgi-bin/xenv
HTTP_ACCEPT_ENCODING=gzip, deflate
REMOTE_ADDR=www.xxx.yyy.zzz.
SERVER_PROTOCOL=HTTP/1.0
REQUEST_METHOD=GET
REMOTE_HOST=alpha.beta.gamma.delta
SERVER_PORT=80
QUERY_STRING=
HTTP_USER_AGENT=Mozilla/4.0 (compatible; MSIE 4.01; Win 95)
HTTP_HOST=xebhh2.bhg.dupont.com
PATH=/bin:/usr/bin:/usr/ucb:/usr/bsd:/usr/local/bin
TZ=MEZ-1
HTTP_ACCEPT=application/vnd.ms-excel, application/msword,
application/vnd.ms-powerpoint, image/gif, image/x-xbitmap, image/jpeg,image/pjpeg, */*
SERVER_SOFTWARE=Apache/1.2.5
SERVER_NAME=xebhh2.bhg.dupont.com
REQUEST_URI=/cgi-bin/xenv
HTTP_FORWARDED=by http://alpha.beta.gamma.delta:80 (Netscape Proxy/2.53)
SERVER_ADMIN=mailto:aschmid4@csc.email.dupont.com
_=/bin/env


To get an overview of all SSI variables, you must know their names and use the directive
<!—#echo var=”variable_name”—>
per variable. This method also works to see the CGI variable names!
Here’s a very simple example which combines both:

* HTML code:
Test for SSI and CGI variables using SSI directive ‘echo’:<BR><BR>
File <!—#echo var=”DOCUMENT_NAME”—> on path
<!—#echo var=”DOCUMENT_URI”—> relative to
<!—#echo var=”DOCUMENT_ROOT”—><BR>
on server <!—#echo var=”SERVER_NAME”—>

* results in:

Test for SSI and CGI variables using SSI directive ‘echo’:

File testtest.html on path /testtest.html relative to /APACHE/PUB/htdocs
on server xebhh2.bhg.dupont.com


Quite simple, isn’t it? But to do more, read the following section as well.

[back]

How to use SSI

What can be done now, having enabled the SSI? If you have this enabled for all documents (.html and not only for .shtml), you may use it to show the same header and footer, current time, file name, file update time, and more on all pages. You can use SSI pre-defined words or you can execute CGI scripts for this. Here are some examples with HTML code and CGI scripts.

To show the file name and file update time your basic HTML code will look like:

FileName: <!—#echo var=”DOCUMENT_NAME”—><BR>
<!—#config timefmt=”%A, %d-%B-%Y”—>
Revision Date: <!—#echo var=”LAST_MODIFIED”—>

DOCUMENT_NAME is an SSI variable for the current Web source file. The directive
<!—#config timefmt=...> declares the format of displaying dates and time, here the European look-alike. You can use the following format masks:

%a Day of week, short Sun, Mon, ...
%A Day of week, long Sunday, Monday, ...
%b Month, short Jan. Feb, ...
%B Month, long January, February
%d Date 01,02,... (!)
%D Date as “%m/%d/%y” 03/30/99
%e Date 1,2,... (!)
%H 24-hour-clock 16
%I 12-hour-clock 4
%j Decimal day of the year 350
%m Month number 7,8,9,...
%M Minutes 7,8,9,...
%P AM or PM PM
%r Time as “%I:%M:%S %p” 05:13:45 PM
%S Seconds 01,02,...
%T 24-hour time as “%H:%M:%S” 17:13:45
%U or %W Week of the year 35
%w Day of the week number 5
%y Year of the century 99
%Y Year 1999
%Z Time zone GMT

LAST_MODIFIED is a pre-defined variable of the file’s date. The same can also be achieved without enabling SSI, using a little Javascript alert which you have to activate with a click:

<A HREF=”javascript:alert(‘File: ‘ + document.URL + ‘\nLast updated:’ +
document.lastModified)” onmouseover=”window.status=’source info’;return
true”>
<IMG SRC=”ball.red.gif” alt=”[ball]” border=0></A>

To display the actual date on top of page:

HTML code (European format):
<FONT color=BLUE><B>
<!—#config timefmt=”%A, %d-%B-%Y”—>
<!—#echo var=”DATE_GMT”—>
</B></FONT>

To establish a page counter:

A little script is needed for this, and one file per page which contains the actual number of hits. I decided to name the counter file PageName.ct, and the owner of this file is SERVER.APACHE with mode 640.

The script looks like:

#!/bin/sh
# Page Counter. called via SSI Server Side Insides as
# <!—#exec cmd=”/APACHE/PUB/cgi-bin/counter PageName”—>
PAGE=$1
ACCESS=`cat /APACHE/PUB/htdocs/$PAGE.ct`
echo “$ACCESS\c”
let “ACCESS = $ACCESS + 1”
echo $ACCESS > /APACHE/PUB/htdocs/$PAGE.ct

It is executed via SSI #exec, here for the main page:

<!—#exec cmd=”/APACHE/PUB/cgi-bin/counter index”—>

The HTML code to display this on each page looks like:
<HR size=1>
<CENTER>
<FONT SIZE=”-1”>
<I>This page has been accessed </I>
<B><FONT face=”arial” size=”+1”>
<!—#exec cmd=”/APACHE/PUB/cgi-bin/counter index”—> </FONT></B>
<I> times since 07JAN97.</I></FONT>
</CENTER>
<HR SIZE=1>

Here is a small script presenting the page access statistics:

#!/bin/sh
echo “Content-type: text/html\n”
CPU=`echo $HTTP_HOST|cut -c 1-6`
echo “<HTML><HEAD><TITLE>Web HP3000: Page Accesses</TITLE></HEAD>”
echo “<BODY background=”../matchy.gif” bgcolor=”#ffffff”>”
echo “<CENTER>”
echo “<H2>CTE: HP3000 - Accesses to Pages (Hit List) <I>$CPU</I></H2>”
echo “<HR SIZE=1>”
echo “<TABLE BORDER>”
echo “<TR><TH># of accesses</TH><TH>page</TH></TR>”
rm /APACHE/TMP/temp_hitlist
for file in $(ls -1 /APACHE/PUB/htdocs/*.ct)
do
   L1=”<TR><TD ALIGN=CENTER> “`tail $file`” </TD>\
   <TD>”`echo $file`”</TD></TR>”
   echo $L1 >> /APACHE/TMP/temp_hitlist
done
cat /APACHE/TMP/temp_hitlist|sort -k 3nr > /APACHE/TMP/temp_hitlist
cat /APACHE/TMP/temp_hitlist
echo “</TABLE></CENTER>”
echo “<HR size=1>”
echo “</B></BODY></HTML>”

You can display the same header and footer for all pages. I combined the three sections of code above to do this. These are nice effects which can enrich your Web pages. For more information about SSI you may refer to special literature. I based my coding on Shishir Gundavaram’s CGI Programming on the World Wide Web, especially pages 87 onward.

[back]

How to get server status and server information

This is an easy one, and does not depending on having enabled SSI. Insert into access.conf the following:

<Location /server-status>
SetHandler server-status
order deny,allow
deny from all
allow from .your_domain.com

</Location>
<Location /server-info>
SetHandler server-info
allow from all
</Location>


You may set appropriate security, using order, deny, and allow parameters. In the example given here, the server-status is only allowed for .your_domain.com, but the server-info is open for all. To see the server status of your Apache/iX server, browse to http://name_of_your_apache_server/server-status. You will see the current activities, but also some totals of server utilization since it was started.

The handler server-info will show you the whole configuration “on a click.” It is especially worthwhile for the Webmaster to check the server status information.

[back]

Secure a Web document directory

If you have big Web projects, you may need to hide some areas from public viewing. Three components comprise Apache/iX document security:

• The program (or unix-like: the binary) /APACHE/PUB/apache_1.2.5_mpe/support/htpasswd
• The passwordfile, in most of the cases named as /APACHE/PUB/security/.htpasswd
• The access definition file in a document directory named .htaccess.

Here is how to implement the access security for a document directory. It is not possible to secure single pages — if you want to achieve this you must keep such a page in its own directory.

Let’s assume the following: Billing information should be made available for some persons via the intranet. First, you create a separate directory under /APACHE/PUB/htdocs named billing. In this directory you will create a file named .htaccess or another name you defined in srm.conf under “AccessFileName.” This file will look like:

AuthUserFile /APACHE/PUB/security/.htpasswd
AuthGroupFile /dev/null
AuthName Access to HP3000 Billing Data
AuthType Basic
<Limit GET>
require user andreas
require user robert
require user deniro
</Limit>

The first line points to the file containing the passwords the user will be asked accessing a document in this directory. On MPE/iX we do not have a Authentication Group File, so /dev/null is a good alternative. The name of this secured area is given with AuthName and will be displayed in the password box of the browser.

For AuthType there is currently only Basic implemented. The Limit directive may be GET or POST or both. This directive describes what is needed to access the documents stored in the current directory (in this example, /APACHE/PUB/htdocs/billing). Here, dedicated user names are required.

The passwords of these users are stored in /APACHE/PUB/security/.htpasswd.

Having created the .htaccess file, you must define the users and the passwords using the program htpasswd. Syntax for the program is
htpasswd [-c] passwordfile username.
The -c flag creates a new file.

In our configuration, the passwordfile is /APACHE/PUB/security/.htpasswd, and the usernames are andreas, robert, deniro. For each, you must invoke the program like this:

:/APACHE/PUB/apache_1.2.5_mpe/support/htpasswd &
:“/APACHE/PUB/security/.htpasswd robert”

The dialog is

Adding user robert
New password:donots
Re-type new password:donots

Bingo! The user and password are defined, and .htaccess ensures that only the required users will have access to the documents in billing.

Typing the URL http://your_apache_web_server/billing will result in the display of the user/password box in the browser. The user robert has to type in “robert” and the password “donots” before he will see something.

It’s as easy as it sounds. The only thing you need to plan for is if you’re using forms and CGI scripts out of an secured document directory. In that case, all other users may use only the CGI if they know the parameters or, in CGI Web terms, the QUERY_STRING which is handled over by the client to the server. This is not prohibited by this method. But it’s not probable that a user will be able to guess a URL like this:
http://your_apache_web_server/cgi-bin/bill_data.sh?box=alpha&period=9903&detail=yes&type=CPU

[back]

Linking to documents and scripts outside the APACHE account

If you want to allow other people and groups you can trust to publish their documents outside of the APACHE account, you must pay attention to security, of course. The HP3000 Web security never bypasses the standard MPE security. So everything outside the APACHE account must be visible and probably executable by ANY ... or is secured via ACDs to allow the access/execution for user SERVER.APACHE explicitly.

This is very important, especially if you will allow the server to store CGI scripts outside APACHE, and so out of your Webmaster’s control, acting as MGR and SERVER.APACHE. But if you have confidence in those people and groups, you can protect the APACHE account from unwanted changes if you have already granted too many people direct access into APACHE.

The easy and non-risky way is to allow .html documents to be stored outside of APACHE. This can easily be established using a link to the group these documents are saved in.

For example, your OpenDesk Administrator wants to share some information on the intranet. They create their own group, WEB.HPOFFICE, having at least R:ANY so that the APACHE server can read it (or appropriate ACDs on each file). The Webmaster now only has to create a link in /APACHE/PUB/htdocs/ to link to this group:

:NEWLINK /APACHE/PUB/htdocs/edi-docs;to=/HPOFFICE/WEB

“edi-docs” is only a name — you may prefer edi-html or od or ... it’s just a name! All references pointing to edi-docs, e.g. http://your_apache_server/edi-docs/introduction.html will show the files in the group WEB.HPOFFICE, in this example the file /HPOFFICE/WEB/introduction.html. No change in any config file is needed!

To allow the execution of CGI scripts not stored in the default group, /APACHE/PUB/cgi-bin/ must be explicitly configured and allowed in the file /APACHE/PUB/conf/srm.conf. In our example, if the OpenDesk Administrator wants to have all his Web stuff in the same group, the following entry is needed:

ScriptAlias /edi-bin/ /HPOFFICE/WEB/

All references to scripts with the prefix edi-bin/ will be searched in WEB.HPOFFICE. For example, http://your_apache_server/edi-bin/statistics.sh will execute a Posix script named /HPOFFICE/WEB/statistics.sh. But again, you must trust in the creator of the scripts stored here, and the scripts must have either R,X:ANY (not recommended) or appropriate ACDs for SERVER.APACHE to grant READ and EXECUTE access. This is much better security-wise, but requires special attention to replacing those files after having edited them.

[back]

Editing and publishing your Web project

The first thought I had upon hearing about Samba/iX was: Wow, now I can edit my HTML code on the PC using a well-known editor (not MS Word) and directly publish on the server into the right place. And indeed, it is a nice option to connect to the Web server via Samba/iX, at least to edit your files under /APACHE/PUB/htdocs or /APACHE/PUB/cgi-bin. This will enable the PC-based editor of your choice to access, edit, and save your Web files. So you may work with PC Qedit, or MS Word, or NetFusion, or any other specialized Web Page Editor. I prefer the freeware program PFE.

If you do not want to “Samba” your Web project, you may use VI.HPBIN.SYS or Robelle’s Qedit. Both are able to keep files in HFS file format.

TDP.PUB.SYS will NOT work — it’s not able to keep HFS file name.

The third option is to work on your PC and to download the Web files via FTP to the right place on the HP 3000. But I think that is an really old-fashioned method — there are better options.

[back]

Summary

This article described some of my experiences with the Apache/iX server running on MPE/iX 5.5. One warning: There are still some unstable areas in the interface between MPE/iX and Posix called STREAMS. But a lot of patches are available to avoid unwanted effects like System Aborts because of this (we hit one using simple piping of sh commands).

Apache/iX is stable and reliable. I want to encourage all of you to make use of it to provide Internet capability from our beloved HP 3000s, and especially not hide yourselves behind the Unix/Internet gurus in your companies! The HP 3000 can do as much as any platform, and Apache/iX is the right step to keep the HP 3000s in the market — together with all the other ported Unix tools like Java/iX, Perl/iX, Samba/iX, Sendmail/iX, Bind/iX and more. Maybe it will help if Hewlett-Packard will educate their salespeople that there is another reality beside NT, Unix and Linux — called MPE/iX!

Andreas Schmidt is a Computer Technology Specialist for Computer Sciences Corp. in Bad Homburg, Germany


Copyright The 3000 NewsWire. All rights reserved.