guidod - project -htm1-pp: htm1 perl processor

guidod

Documents
-(personal info)
-pTA study
-largefile *
-AutoFS howto
-Geschichte der Informatik

(* other webserver)

Bigger Projects
-PFE*
-AC-Archive*
-ZZIPlib *
-XML/G
-C.L.F.R. *
-XM Tool *

Smaller Projects
-errno(1)
-glib-man
-gstdint *
... (patches)
-wine-vol-a

Older Projects
-MPEG split *
-XFCE *
-htm1-pp
-cc-headers
-runso
-substruct-c
-submorph-c
... (patches)
-xwpe
-xfce 3

Download Area *
Sourceforge Project
freespace.sf.net Home

generated
(C) Guido Draheim
guidod@gmx.de

Introduction

The htm1-pp processor expands macros found in .htm1-files and produces a corresponding .html-file. (it accepts .HTM1 as well). The possible macros are not restricted to those predefined - you can even define your own macros in the actual .htm1-sourcefile.

The syntax is modelled as to be able to directly feed plain rfc822 internet-mail messages to htm1-pp . The predefined functionality in htm1-pp will try to produce a nice result, or really, to arrive at a nice result without much changes in that e-mail for you.

The htm1-pp perl-based processor has served me very well in the past months (so I hope to got rid of major bugs). At the time of writing, I am always using htm1-pp to create html-documents - even this html-document has an accompanying .htm1-file (index.htm1) that it was built from.

Invokation

Use `htm1-pp index.htm1' to produce a corresponding `index.html' containing html-code. Messages to the screen will tell you about the predefined macro-files and perl-files loaded in advance to process your `index.htm1'.

If you are often using a Makefile, you may want to include the following:

.SUFFIXES .html .htm1 .c .h .htm1.html: htm1-pp $< all: *.html
so you can just call `make' to convert your .htm1-files into their corresponding .html-files.

Syntax

A htm1-tag does always end with a single colon, optionally followed by subtag arguments in round parentheses.
The start of a tag is marked by

a linebreak. The text to be enclosed with html-tags in your output-file does then stretch up to the end of line. (atleast for the body of your htm1-text - more on this later).
b: this text is in bold face and this is not
this text is in bold face and this is not
an opening (curly) brace. The text to be enclosed stretches to the next matching closing brace.
{b: all this text will be set in bold face } but not this
all this text will be set in bold face but not this

You are free to embed arbitrary html-tags as long as the first opening angle starts at the beginning of a line. The actual source code for the table above looks like this

<table border=1 width=80%> ===:(top) {td-code: {b\: all this text will be set in bold face } but not this } ---: {b: all this text will be set in bold face } but not this </table>
note how the `b:'-tag in the first column had been masked with a backslash.

Predefined macros

Here should be a list full of pointers to pages telling you about the predefined macros. As you can see, most these aren't working. Alas, you are free to look at the source code that is containd in:

bin/htm1-pp
etc/htm1-pp.ht1

htm1

etc/htm1-pp.pl1

`content-table:'

Anyway, these files above (along with documentation) form the distribution files. There's nothing more to it, just small and powerful.

Check out these documents...

Defining macros

The macro names you can choose must match the regular expressions
'[\w\+\-][\w\+\-\#]+' for about any tag or
'[\w\+\-\=\*][\w\+\-\#\=]+' for line-only tags.

In normal words: you can choose any alphanumeric name plus '+' and '-'. Line-only tags may also contain '=' and they may also start with one '*'. Besides used as the first letter, the names may also contain '#'.

html-tag1 & html-tag2

Usually html-tags must be written with an opening html-tag, the enclosed text, and a closing html-tag (written like "</end>"). You can declare those with `html-tag2:' which is followed by the tag name and its arguments.

Some html-tags are not used with a closing tag, e.g. the horizontal ruler <hr>, or the <img>-tag. That's what the `html-tag1:' is for.

Most of the simple html-tags should be predefined, but sometimes you get warnings about `unknown tag {blockquote:...}'. Then simply do `html-def2: blockquote "blockquote"' and recompile. If you try to define a tag that had already been declared in a predefinition-file, you will be warned about `redefining blockquote'. You can again get rid of this warning by preceding the acting word with an `x-*'. That is `x-html-def2: blockquote "blockquote"' will not warn you about a redefinition, but you can still use it.

Note that `html-tag2:' has some intelligence, so that a blue-tag like `html-tag2: blue "font color=blue"' will be expanded to `<font color=blue>...</font>' (which you would expect).

html-def

The `html-def' can do a lot more for you. Contrary to `html-tag2:' you should not leave out the (sharp) angles used for the html-tags. On use of such a html-def, the htm1-tag is replace directly with the text.

Yet, there's some magic in html-def too. When expanding the definition, you can access the values of variables. They should look like `$(name)'. Two special variables are `$(*)' for the enclosed text, and `$(.)' for the current sub-tag arguments on htm1-tag use. (`$()' is a compatibility synonym for `$(*)').

variables

What are variables? Well actually, the name is a little misleading. It is simply a storage of the values of the last expansions. An explicit variable usage is (currently) called `put:'. Just try to `{put:h2}' to get the latest section-header as a string. A real benifit comes from the `x-*' tags - which would normally just suppress a warning about non-known tags being tried. In the header, unknown tags would normally expand to html <meta>-tags. But at the same time, their value is stored for later use.

So in fact, I am often using `{put:title}' and `{put:From}' in the bottom line. And I use `{x-bgcolor: #FFFFE0}' in the header to get the specified bgcolor. Inside of `html-def:' definitions you can access the these variables using a `$(From)' syntax that will be expanded later to the current value of the name variable.
using `{put:From}' in a `html-def:'-definition would be expanded before the definition is stored, so you would get the same value on each expansion of `html-def''s right side. Mostly, this is not what you wanted to have.

functions

Functions are not defined in a ht1-file, instead you include them in a perl-inclusion file a.k.a. pl1-file. The perl argument is simply the current enclosement (`$(*)'). The current sub-tags are handily available in the %Args hash. And the last expansions of other tags are accessible in %Vars. You can add or delete other sub-tag arguments and get the whole slew of them with `&getArgs()', such as in `return "<table".getArgs().">".@_."</table>";'.

Just look into the predefined functions to learn some of the tricks.

load order

On compiling your .htm1 source files into .html files you will notice a bunch of files being pre-loaded for you. Here's the order:

get `/usr/etc/htm1-pp.ht#' and `/usr/etc/htm1-pp.pl#'
get `/usr/html/etc/htm1-pp.ht#' and `/usr/html/etc/htm1-pp.pl#'
get `~/etc/htm1-pp.ht#' and `~/etc/htm1-pp.pl#'
get `~/.htm1-pp.ht#' and `~/.htm1-pp.pl#'
get `./etc/htm1-pp.ht#' and `./etc/htm1-pp.pl#'
get `./.htm1-pp.ht#' and `./.htm1-pp.pl#'

and replace the '#'-number sign with "",1,2,3,4,5,6,7,8,9 in this order. So you may copy some files to your project directory (or the project's subdirectory `etc') and have them included in order. No need to combine them into one. If you install htm1-pp in your home-`bin' then you may want to put your personal favourites under `~/etc/htm1-pp.pl2' or `~/.htm1-pp.pl2' instead of using the "official" `.pl1'. Good luck.

Special Features

There is made a distinction between the header of a file and its body - just like in an e-mail message the header expands down to the first empty line after which the body part starts.

In the header, line-tags may actually span multiple lines - as long as the extension lines don't start at the first column and have rather a little indent-space before.

Even more unknown tags are not really warned - they are transferred into meta-tags. So `Organization: Lucky Company' will expand to `<meta name="Organization" value="Lucky Company"'. You should use this especially to set the keywords of the document!!

The value of unknown x-tags will simply be stored in the Vars-space, and no meta-tag is generated (fine for emails with X-Status tags...).

After the header-part has been fully compiled, the htm1-pp program turns to the body. When it comes to the point to spit out the text, the last body-color, link-color Var-values, etc. are being put in the corresponding sub-tags of the <body>-tag.

Other specials:

an empty line as itself is treated to be a paragraph end.
a line introduced with `> ' goes italic, as it is the normal way of mails to show quotes.
lines starting with '<' are treated as verbatim html-text.
the expansion order is:
1. one-line-tags (thereby skipping those with '#')
2. em'brace'd-tags
3. line-tags whose name contains '#' (they are skipped on 1. pass)
and this order is repeated a number of times so that macro-tags that were introduced by expansion will get expanded too.
`*:'-linetag is nice for `<li>'
you may want to use `content-table-#' that will list all your `{name: named texts}'
quite a few of the predefined macros have a notion of fgcolor2 and bgcolor2 (as opposed to fgcolor and bgcolor values). That is a good way to limit the colors used in your documents and have them always the same in their appropriate positions. (remember, it's an error to have a 'colorful' web-page).

Have fun!

Changes of 1.2 over 1.0

allow `/' in tag names. If a tag starts with `/', just take it implictly to be a `html-tag2'. If a tag ends with a `/' just take it to be a `html-tag1'. Don't warn on it. This renders most html-tag declarations to be superfluous.
there had been a default to warn on unknown tags, but treat them as `html-tag2'. With the above, this is obsolete. And it had been error-prone too, esp. if used as a line tag. So, the default for unknown line tags (in the body) will be to return the tag plus colon plus text. That is, the result is just the source code. (unknown tags in the header will still expand to html-meta-tags)
allow `=' in all tag names, not just line tags. That has to do with the implicit html-tags above: in those implicit html-tag htm1-pp will replace underscores with blanks. So maybe you want to try `{/font_color=red:some text}' to produce some text in red.
introduce bracket-defs: a line starting with something like `[name:] <text> ' will be treated as a `html-def: name <text>' line. Since the text-body is scanned for those lines first, you have the option to put these defs after their use. I hold this to be an intended feature.

and new in 1.2

the syntax of {{text}[tag:args]} and {{text}[tag]} is now implemented. Esp. the second form does limit the available tag-names only to those that do not look like C-expression. In this version (1.2) these are those starting/ending with `/' or `%' or `=' or being just a single non-alnum character - so in fact `{www.fsf.org}[@]' is allowed!
(note that ` {www.fsf.org}[i] ' would not be accepted and hence printed as it was in source, yet ` {www.fsf.org}[/i] ' works nicely)
the next step is attempted - the one-word syntax for the bracket-tag is now also accepted, so you can simply write `text[tag:arg]' or `text[tag]'. But be aware that those don't usually nest, so you should not write `text[/b][/i]' which would not expand the second `[/i]' tag. Even more, the text must be preceded with whitespace|newline at the moment.

Installation

The distribution file contains the files along in their usual subdirectories `bin' `etc' and `doc'. So installation goes safely by going to the desired prefix-directory (or just stay in your home-directory) and expand the tar-file. In an xterm-shell you would do `tar xzvf htm1-pp.tgz'.

It is very probably you have to edit `bin/htm1-pp''s first line to point to the `perl'-interpreter. Again, in a shell you can usually type `type -p perl' or `whereis perl' to get the path.

That should be it. It may get a lot nastier on non-unix systems. I am currently running it under linux, hpux and solaris, installed under my home directory.

Bugs & Caveats & Things to watch out for

bugs & caveats do most often have to do with functions, since those are quite special. I recommend to name all functions with an additional non-alpha-numerical character in the name, so that you'll be warned. (well, quite a few function-tags don't follow this rule yet, I'll try hard to rename them... so expect changes)

if you use `html-def:' to create a new tag in your header, you can embed htm1-macro-tags in there, so they will expand to their definitions. This is quite handy, but it may not do what you expect, esp. if some of the htm1-tags are functions. The reason is, that the embedded tags are expanded before the definition is stored, ie. the functions are called, may return sth. (or just fail), and the later usage of your new tag will not call the function again. [fixed]
all (curly) braces must be symmetrical. If htm1-pp encounters a brace-pair that is not a tag-call, it will mask this pair assuming it came from some embedded notation, especially C-code, C++-code, Java-code, etc. Sometimes a brace is not part of a pair, so be sure to mask it explicity by using `\{' instead of just `{'. (look into the .htm1-source of this file to see how I have written this paragraph...)
writing text does sometimes use a colon after a word to introduce a second part of the sentence. This may be seen as a line-tag if it happens to be at the left column in the text. Watch for messages saying "undefined tag ^...^" - it does often beat on that one.
(you could start each simple text line with a space, but actually this is a thing you can recognize easily in your source)
[fixed] see Changes section on new intepretation of such line-tags
Sometimes you get strange error messages, or just something in (round) parentheses has gotten lost in the resulting text. This is often the case if you wrote something like `{small:(oooohhh)}', where the word in parentheses is held to be a subtag-argument to `small:'. The solution is to put an additional space between the introducing htm1-tag and the first one of the parentheses, ie. `small: (oooh)'. Introducing spaces in the vicinity of everything looking quite like a htm1-tag is always recommended.

License

The package is Free Software in the notion of Lesser GPL, (the `GNU Library General Public License') - you can inform yourself about the meaning of free software, their benefits/drawbacks/liability and other possible lincenses at www.OpenSource.org, or the Original - The Free Software Foundation.

- and remember: faithful in the matter, but use at your own risk - the sources tell you all
^{they are your safety, warranty, compatibility, portability,
and much more}

References

SmartHTML html-processor

(see appindex at freshmeat/web )

Perl Scripting Language

htm1-pp

htm1-pp ^1.0 and htm1-pp ^1.1 are the predecessors of this package.

Comments

comments, bugfixes, patches or extensions - everything's wellcome.

Erwin S. Andreas (author of SmartHTML) wrote:
Cool, I'm glad SmartHTML proved useful beyond its original intent :)
I like some of the ideas, e.g. using the META GENERATOR tag, the argument abbreviation (i.e. change an argument of "100%" automatically into width=100% sine that's where it's used most), definition of global "variables" (e.g. $(fgcolor)) so you can easier change your color scheme.

htm1-pp ^1.2