A Guide to zongList

Introduction

zongList is a web-based dictionary framework, designed to be used through standard browsers and optimised for mobile use. It works both offline and online. It lets you navigate through a dictionary using themed, searchable lists. Access to those lists is by dedicated keypads, customised to suit the language of the list in question.

Purpose

The purpose of zongList is to help deliver the results of lexicon documentation back to speech communities, particularly to those in remote locations and developing countries. This is the reason for the focus on mobile devices, since phones are typically the first computers to find their way into common ownership in many such communities. It is also the reason for making it function to a useful degree offline since the availability or affordability of phone 'data-plans' cannot be expected under such conditions.

I hope the framework will be useful to those wanting to publish lexicon data online generally. This guide provides information for evaluating and using zongList. Each section starts with basic information for those new to the concepts followed by gorier details for those wishing to try it out.

Contact and Example

Being a framework, rather than a software package, zongList does not offer a seamless solution to putting a dictionary online, and I cannot pretend that adapting it will be without challenges. For help with those challenges you can contact me on margetts{dot}andrew{at}gmail{dot}com.

To see an example, visit the site of the Tima dictionary. This project sponsored and funded the initial development of zongList - please see the acknowledgements below.

The Tima dictionary started life as a Toolbox lexicon using the Multi-Dictionary Formatter (MDF) marker set. This kind of pedigree is not required however - almost any structured dataset could be adapted for use with zongList.


The Idea

zongList is by no means the first attempt to put custom dictionaries on the web or on mobile phones. Many linguists will be familiar with Lexique Pro, which like Toolbox and the MDF, is by SIL International. That program can among other things create web versions of lexicons developed in Toolbox. Also, ways have been developed to put dictionaries onto phones before using e.g. installable Java programs like Wunderkammer, part of the Project for Free Electronic Dictionaries.

What makes zongList a little different is the focus on mobile browsers and that it is an attempt to build such a framework using only common web technologies and idioms. The idea is to lower the entry barrier for those wishing to create an electronic dictionary, and to make it appropriate for contemporary patterns of use.

Basics

The framework uses only standard, but modern, web components: HTML5, CSS3 and JavaScript for presenting the dictionary, and JSON format for storing the underlying data. These are all open, plain text systems, and although powerful are rather easier to get started in than something like Java. And (like Java) they are also platform independent which simplifies development enormously.

Although designed to be delivered over the web, there are no tasks for the server beyond providing static files, i.e. no need for 'backend' logic or databases. Therefore zongList is easy to deploy and maintain.

Note that there is by design a clear separation of presentation and data. Because of this, aspects of the framework might be of value in two distinct ways: as a presentation shell, and/or as a means to prepare data for external consumption. These two aspects are described next.

Presentation

The framework itself can be used as a starting point for making any digital lexical dataset available online

Because it is open and text based, zongList can simply be used for making new list-based dictionaries, adapting where necessary to suit particular languages and data hierarchies.

It could also be easily modified to make something quite different (perhaps providing alternative functionality like more advanced searches, or a completely different look).

Using standard items, it supports special fonts and keypads, and embedded audio and video without plugins.

Data

The lexical data can be reused in other ways by third-party programs or services

Because the information is stored in JSON, which is probably the most popular format for web-based data at the moment, it is straightforward to expose it for other purposes, such as being incorporated in a larger dataset or allowing remote searching from other applications.

So the guidelines below covering conversion considerations and techniques could be of help in converting an existing dataset to a viable JSON version, even if the rest of the framework is not of interest.

Use

Using the framework involves working with these two different aspects: adapting the structure, look and functionality (i.e. HTML, CSS and JavaScript respectively) and preparing the data (i.e. converting it to JSON, since it is fairly unlikely it will be in this format already).

Both topics will be covered, separately below. Perhaps logically the preparation of the data should come first, but to give an impression of what the framework actually offers we'll start with that.


The Presentation Framework

Overview

As mentioned, the framework uses only the standard, client-side set of web technologies: HTML, CSS and JavaScript. This means it can work on virtually any modern device and platform that has a web browser. Although it is aimed particularly at mobile (especially phone) use, it is not an 'app' in the sense of something that can be installed on an iPhone, Android or similar machine and which uses the underlying operating system natively. There are several reasons for this choice:

Native vs. Web App

  • 'Native' apps might be somewhat faster and have extra functionality, but in the case of a dictionary not to the extent that an HTML version is noticeably deficient.
  • Native programs are more complex to develop and deploy (requiring compilation to machine code etc).
  • They require a multiplicity of versions to cover different platforms.

By contrast a 'web app' means developing only one, relatively simple program that can at any stage be read and edited by a human with a text-editor.

And because modern HTML documents can be configured to be kept indefinitely on a host (subject to the user granting permission for this), the experience of using a web-app offline can be quite similar to that of an installed native app.

Since the idea of this framework is that it can be both edited easily and deployed widely, the web app solution seemed most appropriate.

It should be pointed out that should a native app version be required there are systems, such as PhoneGap, that can in theory turn a web app into a native one without the need for custom coding.

Modern Standards

In order to function as intended the framework relies on the latest iterations of web technologies, i.e. HTML5, CSS3 and modern JavaScript libraries like jQuery, and up to date versions of browsers. The main drawback of so using these technologies is that despite rapid development and a push towards standards-compliance, there remain gaps and inconsistencies in the capabilities and behaviours of different browsers, particularly in the mobile sector.

At present (2013) the emphasis is on ensuring zongList is fully functional in selected browser and platform combinations. These are: Firefox; desktop versions generally, and the mobile version for Android.

This is due to the current predominance of Android in low to mid range devices (i.e. those likely to be available in developing countries), and the fact that Firefox is both popular and, as far as possible, platform agnostic. This does not mean that zongList does not work properly with other combinations of browser and platform, only that it is not guaranteed.

An inescapable fact is that, however closely browsers converge with regards to standards compliance, websites will always look and behave slightly differently when viewed on different browsers. It should not be expected that the end result can be nailed down as rigidly as in a printed document, or for that matter a native app.

What zongList Offers

Before examining how zongList works, we should look at what it actually does: how the lexicon is treated and served up. What follows is a description of zongList's core functionality, but note that it is not the idea that a user need read or understand all this.

The intention is that processes are logical and layout predictable to the degree that using the dictionary feels intuitive and that very little recourse to 'help' files is required. Feedback on the success of this aspect is especially appreciated.

Multiple Language Support

Any dictionary of this sort needs to provide for multiple character sets and writing systems to properly render all the data to screen. These concerns can be accommodated in stock HTML and CSS using inbuilt support for variable writing direction and for embedding special web-fonts. zongList uses SIL Charis by default because it looks good, has good coverage of IPA and other 'special' character sets and is free. This guide uses Charis too. It is simple to change, or add to, this base specification.

Lists Everywhere

As implied by the name, this framework treats its underlying dataset as being defined by a series of lists. These lists become the organising principle and entry point for examining the data - for reading the dictionary.

The lists themselves are derived from the various fields of information found within the lexicon. So for example one list could come from a 'semantic domain' field to produce a kind of 'categories' list where one could search for all words marked as being to do with say 'birds'. Each list is sorted according to the normal order for the language it uses.

The first action that a user is invited to take then is to 'Choose a list...' from the list of lists. Lists everywhere.

Combined Lists

Sometimes the mapping of fields to lists will not be so obvious and we will want to combine fields in lists.

For example in the Tima dictionary (all zongList examples will come from this, the prototype, dictionary), the lexicon is organised at root by head-words, but within each head-word there may exist many sub-entries.

Searching for sub-entries would actually be a more likely activity than looking for head-words. In a printed dictionary it would be quite hard, without a special index, to find particular sub-entries as the dictionary is necessarily sorted by the head-words.

In an electronic version of course we have the option to make a dedicated list of the sub-entries. However bearing in mind that the intended audience comprises lay-people rather than linguists, it seems more appropriate to make a list that combines head-words and sub-entries, all sorted by the normal order and just called 'Words'.

Another example is the combination of the general field for English definitions together with the (occasionally used) one for literal translations. Again, in this case it seems natural to group together fields that would likely be used for searching for the same kind of information, even though they serve different purposes in the dictionary.

Filtering Lists

Simply returning a full list of items is not very useful. The Tima Words list for example has over 7000 items. This is not something one can work with: there needs to be a filtering mechanism. In zongList this is provided by way of pop-up keypads. These are invoked by clicking the 'Filter the list...' box (which is then subsequently used to display the filter).

Each keypad can be designed for the particular list to which it controls access. There are three main options open for configuration: choice of characters/glyphs, choice of writing direction, choice of match position. In addition every keyboard offers some control and wild-card keys. I will explain each of these next, but bear in mind that these options are just the default: like everything in zongList, you could change or add to them radically.

Character sets

Each list is encoded in a single language, and each language will require a unique combination of characters (represented by glyphs) and sort-order. Therefore each keypad is assigned a custom layout for the appropriate language. You can change these layouts (i.e. in the underlying code) with simple copy/paste operations.

Writing Direction

Each language can be defined as using either left-to-right or right-to-left writing direction. This characteristic can be easily defined for each keypad, and also for how associated lists are subsequently rendered to the page.

Default Match Position

This option is dependant on the type of list as well as the language it uses. The idea is that a search can be constrained to match either at the left hand end, the right hand end or just anywhere inside the text (including matching at left and right ends).

In Tima, the Words list is set to match from the left end by default because this is how one typically uses the main index of a dictionary, e.g. "I want to search for words beginning with 'yo'". If Tima had right-to-left order like Arabic then this list would be set to right hand end.

Conversely both the English and Arabic Definitions lists (and indeed every other list) are set by default to match anywhere. This is because searching in these fields is typically more free form - one might think "find any items which contain 'fish'".

The thing to note is that these are just the default positions. Every keypad can be reconfigured by the user at any time to match differently by using the arrow-like keys at the bottom of the pop up keypad display. This enables e.g. searching for certain endings using 'right hand end' in a left-to-right language.

Control and Wild-card Keys

Every keypad offers a few self-explanatory control keys for things like adding a space or removing the last character, to assist in editing a search term. In addition there are two wild-card keys: one that matches any single character and one that matches any number of characters. These cover the basic needs for this kind of search mechanism (i.e. one based around whittling down long lists rather than making precise and complex matches). Regardless, both wild-cards use straightforward regular expression syntax (which shows up in the search-term box) and so could be extended or modified easily enough for special needs.

Keypad Only

Note that text can only be entered via keypads - pressing physical keys will do nothing. This is for the sake of consistency: many lists can only be conveniently filtered using a custom keypad (multiple special key-press combinations would be very cumbersome), and most mobile devices do not offer physical keys anyway, so in my opinion providing key input is unhelpful. However you should be aware that the underlying keypad mechanism can be configured to allow key entry.

Simplified Searching

A particular challenge is configuring searches for words containing diacritics or other extra marks. One solution could be to add all possible variations of inflected characters to a keypad, but this would not only make the keypad horribly large, but force the user to remember the exact orthography, or else resort continually to using wild-cards. Another strategy might be to 'dumb down' the dictionary itself and remove things like diacritics, but this would not generally be acceptable.

Ideally the search mechanism would not impose any such degradation of data but would still be easy to use - it should 'just work'. However, trying to get uninflected search terms to match inflected items on the fly (with e.g. clever regular expressions) can be fraught with error and performance issues. There are often just too many possibilities to anticipate and accommodate.

The path that zongList takes instead is to offer what one might term mapped lists. These lists are pre-processed to pair the original inflected items with uninflected versions. (Technically the lists become an array of two-element arrays; this occurs as part of the data-preparation stage, described separately below).

Then the user simply ignores diacritics when devising a search term (they have no choice since the inflected characters do not occur in the keypad) but still gets the hits (plus possible variations) that they expect: search terms are matched against the simplified elements and then the paired, original ones are used to link to items in the lexicon.

Note that this mapping is completely customisable, so not all such symbols need be mapped. For instance for Tima it was deemed appropriate to provide a separate key for both t and because of the importance of the distinction. In this case occurrences of in words are not mapped to t.

Other things apart from combining-characters like diacritics could also be mapped. In Tima the 'downstep' arrow marker is a separate character that one would not expect the user to know and so it is is simply removed from the simplified versions in the map.

Similarly In Tima the ð sound is being replaced in modern usage with y though it still occurs in the dictionary. To align with contemporary expectations it was decided that the keypad should only include the y and that this character would also find instances of ð in the dictionary. Again this was achieved by mapping.

The Result

Whether a search term is entered or not, clicking 'Go' (or just clicking outside a keypad) invokes the filter mechanism and returns whatever matches from the list.

Paged Results

The resulting sub-list is divided where necessary into pages of ten entries, both in order to improve useability on small screens (by reducing the need for scrolling) and to improve performance: rendering hundreds or thousands of items (most of which are not even in view) just takes time.

Adopting ten as the denominator makes it easy to get an idea of how many items are matching the query (using the page count given at the top). But this number is not sacred and can be changed if you find it unsuitable.

Sticky Search Terms

Since the idea is to enable the user to rapidly reduce the original list to something more manageable, the search term remains in place until it is cleared, or until a different list is chosen. This enables the user to amend the term, without having to start from scratch, until the item or items of interest are visible.

Viewing the Details

Clicking on any item in a list (filtered or not) will take the user to the entry or entries in the Dictionary that correspond. What the user sees depends on the number of entries that match.

Lists of Lists, Sometimes

When using say, the Word or Definitions lists the matching result of clicking on a list item will frequently be a single dictionary entry. In this case it will appear already 'expanded'. If the result is instead another list (i.e. more than one entry in the dictionary matched - often the case in something like a search in the Categories list) then it will be presented as such, cut into pages of ten items if necessary, as before. Clicking on any item will expand it; clicking again will close it.

Highlighting and Legend

The search term is highlighted wherever it occurs in the matched entries, in order to make it obvious why each entry was returned. This effect can be turned off (and back on) by the user with the 'Hits' button at the top of the screen.

The field legend is also displayed by default, and once again it can be turned off using the 'Legend' button at the top. (In order e.g. to improve readability on very small screens).

It is possible to change this arrangement (legend always on; highlighting never on; no change buttons; etc) in the underlying code. And of course the actual legend labels are customisable to suit the nature of the data.

Associated Media Files

In addition to textual information, entries may contain links. More interestingly, when these links refer to still images, and audio and video clips, zongList can display and/or play the associated media without plugins, provided they are in a suitable encoding.

The Tima dictionary makes extravagant use of this capability to show photos and play sound clips. zongList displays the photos in a modal (pop-up) window, and plays the sounds using the HTML <audio> element. Although it has not been done here, it should be possible to combine these techniques, using the <video> element instead, to present film clips as well.

Two caveats apply however: Particularly for playable media, the list of supported file formats/containers is quite small and it varies between browsers. Also, the mobile browsers are generally still behind their desktop counterparts regarding full support for all aspects of <audio> and <video> elements (they all support them to some degree).

What this means in practice is that some care and trouble is needed, and some compromise might be necessary, at least for the time being (browsers are evolving rapidly). For instance, currently Google Chrome desktop browser plays the Tima sound files, but the stock Android 2.3 browser (which being supported by Google, one might expect to work similarly) does not.

To reiterate, at the time of writing (2013), Firefox desktop and mobile are the preferred browsers for zongList.

Offline Access

zongList is designed to work offline as far as possible for reasons already given. Two common HTML mechanisms for achieving this are 'Application Cache' and 'Local Storage'. At the moment zongList uses only the former though it may end up using the latter as well. This is an area that still needs more work, so details will be added in due course.

The main thing to note for now is that regardless of the mechanism there will always be constraints on what information can be delivered when offline. These are mainly a result of the limited memory capacity of mobile devices and the fact that they tend to be more restrictive regarding what can be stored than regular computers.

In practice it is anticipated that zongList dictionaries should always be able to store all textual data, including fonts, locally. This means that the core list operations remain fully functional when offline: i.e. you can always look stuff up.

On the other hand probably only relatively few images and audio files will be able to be stored offline, so the valuable additional context these resources provide may be absent.

This is mainly because of both the space these files require and the workarounds currently necessary to get mobile browsers to store non-text files (such as Base64 encoding which has the unfortunate side effect of further increasing file size).

It is likely that this situation will improve as devices become ever more capable, provided vendors do not endlessly attempt to hamper browser capabilities in order to favour their own native platforms.

In the Pipeline

That covers the principle capabilities of zongList. There are many things one could add or do differently. Here is a short list of enhancements under consideration. These may never see the light of day, but they might in any case serve to stimulate your own ideas.

Note that this sort of development is only feasible because of the separation of data and presentation. Were the two intermingled (as e.g. is the case for the web views generated by Lexique Pro) such arbitrary re-analysis and presentation of the data would be all but impossible.

Advanced Search

The List based search approach is intended to give most users a comfortable entry point to the dictionary but it does have limitations.

For instance at the moment if you wanted to see and count all Tima dictionary entries (i.e. records) that contain say, the word 'donkey' somewhere, you would have to search the English Definitions and Examples lists separately.

And then you would have to click on each list item separately to see what it refers to. Since there might be some overlap you couldn't easily tell exactly how many records contain at least one mention of donkey.

So one enhancement would be to offer a way to choose one or more fields to search directly (rather than via our lists), and then apply a search term to that group of fields, all at once. The result would be the list of dictionary entries that match, i.e. there would be no intermediary 'filtered list' step as with the usual mechanism.

Sub-Entry Overview

Given the nature of the Tima dictionary where many sub-entries exist under relatively few head-words, it might be useful to offer another view of this arrangement. For instance one could have a special list of head-words in which clicking on an item does not take one directly to the entire entry but just displays the sub-entries it contains, as a hierarchical tree. (This head-word list could still be filterable; and clicking on a sub-entry could still take one to the relevant expanded entry).

Cross-Ref Graphs

Most dictionaries contain cross-references. At present The Tima dictionary does not do anything with these except display them. The first step to improving on this is of course to make them function as real links.

Beyond this one could also use the information to generate a graphic representation (e.g. as a network graph) of how items are linked to others, maybe to more than one degree of separation.

How zongList Works

This section explains the underlying code that makes zongList work. We will start with a brisk tour of the web triad of HTML, CSS and JavaScript for those unsure about what each component contributes. Then I will go into detail about how each of these fit into the framework. Along the way we will also briefly consider the JSON format since it is central to the query architecture, though most of that discussion will be deferred to the major section 'Preparing the Data'.

Three Web Languages

Broadly speaking these three technologies, or languages perhaps, are responsible for the structure, look and behaviour of web documents respectively. There is hardly an agreed definition of each but one could say HTML is a markup language, CSS a styling one and JavaScript a programming one.

In theory each has a distinct rôle. In practice there is considerable overlap between them. For instance HTML used to offer a <font> element which could be used to style, well fonts. These days this job is assigned to CSS. More significantly, there is considerable cross over between CSS and JavaScript with both technologies used to perform common tasks like hiding and displaying content according to user input.

In very general terms I think it can be said that CSS should be used where it can be used, leaving JavaScript to do what remains to be done. There are at least a couple of good reasons for this.

  • CSS is a kind of 'declarative' language: you say what you want the effect to be (e.g. the colour of the text) and the browser finds a way to implement it; you don't have to explicitly describe how this should be achieved. JavaScript on the other hand is rarely very declarative in nature. More often you have to describe the steps that will make something happen. JavaScript 'libraries' like jQuery (of which more later) ease the burden considerably, but still it is often just plain easier to achieve an effect in CSS than in JavaScript.
  • Increasingly, browsers are being designed to optimise CSS, which means that less load is placed on the system and some effects (like things involving transitions) might render better than with a JavaScript solution.

A Simple Example

The design and execution of this page (i.e. the zongList guide) can serve as an example of the typical interplay between these technologies. I will describe separately how HTML, CSS and JavaScript are used in this case.

HTML

This guide was conceived as a simple hierarchical document, similar to what one might produce in a word processor. Therefore only very basic HTML was required: heading elements ( <h1>, <h2> and so on), lots of paragraphs ( <p>) and the odd 'unordered list' ( <ul>).

These elements alone produce the required structure, resulting in a page that can be read easily, even if CSS and JavaScript are not available. But there are a few things lacking that would improve useability. One is just the ability to determine fonts, line spacing and so on - in other words the 'style' (just as in a word processor). Another is a clickable 'Table of Contents' (TOC) to assist with navigating such a long document.

The first of these is clearly the domain of CSS, and no more need be said. The second is more interesting. As it necessarily involved JavaScript before CSS, we will look at that first.

JavaScript

A TOC could be made by hand, but it would be very difficult to maintain - every change to the document would require a change to the TOC too. It is very easy to add a TOC dynamically using JavaScript - even easier if one uses a jQuery plugin.

jQuery is a hugely popular JavaScript library (a library being simply a parcel of related code), that helps one write succinct and readable JavaScript. It is particularly useful for certain common tasks, and its value for this is further increased because of the enormous number of plugins that have been developed for such things.

In this case, a moment with a search engine revealed a suitable plugin for creating a table of contents and then it was simply a matter of linking to both jQuery and the plugin in the HTML and writing a small amount of code to make it work.

Actually that description is not quite accurate. In order to make the TOC work I had to make sure my HTML and CSS conformed to certain prerequisites spelled out in the information for the plugin. This was not onerous but it was necessary.

And this is the aspect of HTML, CSS and JavaScript that can be most baffling: very often there are interdependencies between them. And the more complex the site, the more need there is for clearly understanding what these are.

To clarify what goes on here, the HTML structure is required to be of a certain type: in this case the use of different headings elements determines the automatic hierarchy of the TOC.

The plugin also produces HTML of course: the TOC is created as a series of HTML hyperlinks to the headings, and extra 'back-links' are added to those original headings to enable one to return to the TOC when necessary. Meanwhile CSS is used primarily in the form of 'styling hooks', if you like, by adding CSS 'class' and 'id' attributes to the HTML.

This particular division of responsibilities is not cast in stone. For instance with this plugin I could have used generic <div> elements instead of headings, and then used CSS class attributes attached to these tags to determine the TOC structure. This kind of flexibility is powerful, but at times confusing.

I should mention that JavaScript purists might scoff at loading the entire jQuery library and a plugin just to achieve this dynamic TOC. They would argue that it would be trivial to write something from scratch in JavaScript in far fewer lines.

They might be right, but the point about jQuery is that it enables non-expert programmers to also achieve good quality results, quickly and effectively. jQuery is ubiquitous and there is no shame in using it.

CSS

What, if anything, does CSS contribute to the useability of this site beyond basic styling? Quite a lot as it happens. Our TOC has to live somewhere, and CSS helps it find the right place at all times.

If you view a long document in say a PDF reader, the conventional position of a TOC is in a bar at the left. When you make a selection from it you expect the main text to scroll to the correct point and you also expect the TOC to stay put. Lastly, you expect a TOC itself to be scrollable if necessary.

None of these expectations are granted automatically - they must be written into the application. For a web page such behaviour is certainly not the default. Replicating it, as far as is possible, is generally the domain of CSS. In this case a particular combination of attributes ('position', 'overflow', 'top', 'bottom') and their values (e.g 'fixed' vs. 'relative') applied to specific HTML elements does the trick.

Something that all electronic documents are faced with is variable screen size. This is especially challenging today since mobile phone screens are not only so small but also have to function as input devices. Screen 'real estate' on a phone and on a conventional PC are entirely different things.

CSS offers various tools to cope with this problem. One of the most recent, most powerful and easiest to use is the concept of a 'media query'. These allow one to change any aspect of the styling in response to the detected screen size. You can see the effect of this by resizing the browser window. As the screen gets smaller you should see the TOC and main columns adapting to suit.

While this might be nice on a desktop or laptop screen, it is of most value on phones and tablets which are not only small but which also are supposed to flick usefully between portrait and landscape orientations.

In the case of this site, when a certain screen width threshold is reached the TOC shifts from being at the side to being stacked on top. This is because the sidebar arrangement just becomes too cramped. To compensate for the fact it is no longer always visible, back-links now appear next to each heading in the text to enable one to shoot up to the TOC again. Also, a title is added to the TOC to provide orientation. These effects are all handled by CSS inside media queries.

zongList, being designed principally for mobile use is set up by default for this kind of view and does not currently offer any alternative for larger screens. This could be changed using CSS queries.

One last comment on CSS in this site: The effect of a fixed or 'sticky' TOC sidebar could also have been achieved with JavaScript - there are many such solutions out there. I opted for CSS for the reasons given above, but had I needed some extra functionality not available in CSS at present (some really flashy animation perhaps) then I might have used JavaScript instead.

Another reason JavaScript is used in these situations is where support for older browsers that don't support the full gamut of CSS is necessary. E.g. JavaScript could be used to replicate the effect of media queries. This argument does not really concern us because zongList relies in so many ways on using up to date browsers that support for legacy programs and platforms is out of the question.

The point to remember is that while JavaScript and CSS are very distinct languages, there is overlap in how they are put to use.

zongList's Innards

Now that the general nature of HTML, CSS and JavaScript - at least as I understand it - has been described, we can look at how these technologies come into play in zongList.

The heart of any web application is, as with a regular web page, an HTML file since that is what the browser deals in. CSS is only of value in so far as it styles HTML (and XML) markup, while even though a program could be written entirely in JavaScript it would still have to create some HTML at some point to be rendered by a browser.

A web site can have any number of HTML files associated with it, but one of this kind - sometimes called a 'single-page web application', for obvious reasons - really only needs one page. What appears to be a shift from one page to another to another and back again is achieved through the smoke and mirrors of CSS and JavaScript: elements being hidden and revealed, data being filtered, loaded and unloaded behind the scenes.

Since we only need one main page (we can always add supplementary ones later if we need to), it makes sense to call it 'index.html' and place it at the root folder of our web server. This convention (for defining a 'home page') allows it to be discovered without actually needing to specify the file name in the address, e.g. for this guide page 'http://zonglist-guide.mine.nu/' is the shortcut for 'http://zonglist-guide.mine.nu/index.html'.

Our single page acts as a stage for the show, but what it must contain (as hard coded HTML) is determined largely by the JavaScript and CSS. These are normally supplied as external files, referenced by the home page with <script> and <link> elements. These files are the topic of the remainder of this section.

We will start with a particular aspect of JavaScript, the jQuery Mobile code library, since this to some extent underpins the whole framework. Then we will consider two other external JavaScript libraries used by zongList: jQuery Keypad which generates the keypads, and doT.js, a templating system which controls the translation from JSON to HTML. Lastly we will cover the code that is specific to zongList and how, in general, it can be adapted.

Whenever external code libraries are used, there are certain principles that always apply. These are to do with preserving the integrity of that code:

  • The external JavaScript file should not be edited. When you need to use, change or add to its functionality, you should do so in your own code (which should be loaded into the page after the external library). This principle is well understood by library writers and usually they will build in extension points for their code to make it easy for you to do this.
  • Similarly if a JavaScript library has an accompanying CSS file, this should be used untouched. Your additions and overrides should be made in your subsequent CSS.
  • Assuming you have adhered to the previous two points, you should use 'minified' versions of external JavaScript and CSS code (if available) in your final site. These are files with e.g. all extraneous 'whitespace' removed, and are as a result almost impossible for a human to read. However the machine has no such problem and, being smaller, minified files do lighten the download burden, which is particularly important for mobile sites.
Framework (jQuery Mobile)

As the name suggests, jQuery Mobile is related to jQuery itself and indeed is dependant upon it. It is a major, official jQuery resource. Clearly this library has to do with mobile website development, but the name is a bit misleading because jQuery Mobile has more in common with jQuery UI (another large jQuery initiative) than jQuery core. UI stands for 'user interface'.

In other words this is a layer built on jQuery core that is intended to enhance the way a user interacts with a site rather than focussing on helping the developer to add functionality or control aspects of the site. jQuery Mobile is essentially jQuery UI Mobile.

In reality jQuery Mobile is more like a framework than a simple library. By that I mean it comes with accompanying CSS files and copious guidelines about how to write HTML to take advantage of the various enhancements on offer.

I chose this framework because it addresses many of the recurring themes and problems of designing web sites for mobile devices. And it does so in such a way that the end results resemble typical native apps. This is not a matter of fashion - it is useful to build on prevailing conventions, since it makes it easier for the new user to get started.

jQuery mobile makes extensive use of HTML5 'data-* attributes' to control and/or build HTML elements dynamically. This is rather like how the TOC is created on the fly for this site: some of the HTML does not exist in the original source code (rather, it is created by JavaScript), but the page behaves as if it does.

The idea is that with quite simple HTML and a few naming conventions one can build a site that is 'responsive' (a buzzword meaning basically that it adapts to different screen configurations, although the emphasis is very much on mobile devices), and more useful for small touch interfaces (e.g. regular links become large buttons).

To clarify, use of this library at its simplest just involves writing HTML that will hook into what jQuery Mobile offers. After that one can extend and customise as required. This is what zongList does. For instance all the list-filtering capability is created with custom code since it goes way beyond the simple options for searching that jQuery Mobile provides.

A framework like this is very convenient but does have the slight drawback of guiding, or even prodding one, towards a given solution. For instance there is a bias towards creating hyperlinks (<a> tags), the idea being that even if JavaScript is disabled the links will still work. While this might be appropriate for some sites, zongList simply would not function properly without JavaScript so this is an irrelevance, albeit a minor one.

Fortunately there is an easy escape route, if such is required: Because jQuery Mobile merely changes simple HTML into more complicated HTML with embellishments, it is perfectly feasible to remove the library and work directly with the original HTML with one's own code (and to change that HTML of course).

If you do remove jQuery Mobile make sure you leave jQuery itself in place because it is used not only by the keypad mechanism described next but also by zongList's own code.

Keypads (jQuery Keypad)

Again, this library depends on jQuery core, but unlike jQuery Mobile, it is simply a small plugin designed to add specific functionality (like the TOC plugin used on this page).

jQuery Keypad, developed by Keith Wood, is not the only library offering this service. I selected it because it is easy to use, clearly documented and has sufficient features (things like right to left order, custom keys, callback functions and so on) for our purposes.

Because it adds visual components, this plugin also comes with a base set of CSS styles which can be added to or overridden in the manner already described.

As with all other aspects of zongList, you could replace it with another such plugin or other library if it is found wanting.

Every new dictionary implementation is going to require new and different keypads, even if the general framework is left untouched. A reminder then: the place to make such changes is in the zongList JavaScript file, not the keypad library. See the section below on zongList code.

Templates (doT.js)

This library, written by Laura Doktorova, also does just one thing. It is not dependent on jQuery or any other library, and because there is no visual component to what it does (which is purely to transform data), it requires no special CSS.

There are many templating systems available. What drew me to doT.js was not so much its ease of use (it is not difficult as such, but the documentation is somewhat sparse), but its reputation for speed and its ability to handle nested data structures.

Such a system applies a set of transformation rules (defined via a template) to a dataset in order to produce a useful view - in this case converting from JSON data to HTML for display in the browser.

It is helpful if a template resembles the end result as much as possible, i.e. our templates should look like HTML, but with extra sauce. This makes them easier to read and therefore to modify, which is important because, even more than with keypads, each dictionary will require unique templates to cope with the particular hierarchies of the lexicon dataset. I feel that doT.js templates satisfy this condition.

Unlike most other customisations, templates are most conveniently defined in the main HTML file (i.e. typically in 'index.html') rather than in a JavaScript file. They are embedded in a <script> element. This element is ignored by the browser (if it does not understand its 'type' attribute), so it is a bit of a trick really - placing it thus simply allows for a more readable syntax for the template markup. (Templates are still called from, and interpreted within JavaScript files however).

zongList Code

Finally we come to the JavaScript and CSS files that contain both the custom code that makes zongList what it is (rather than simply a collection of other people's work) and the places where you can make your own alterations. These files are 'zonglist.js' and 'zonglist.css'.

This double-duty might surprise you given the recommendation earlier that external files should not be edited. You might have expected that I suggest leaving these files intact and making your own overrides.

It is the nature of CSS that, by and large, rules that come later but are otherwise equally specific take precedence, so it is unproblematic to suggest extending zonglist.css for your own purposes; you could also leave it intact and simply load another file later if you prefer.

The situation is not so clear cut with zonglist.js however.

The reason I do suggest modifying this file is that zonglist.js is rather different from the other JavaScript files. Those files are like workshops, full of powerful machines for doing things. zonglist.js however, while it does contain some specialised machines (for invoking filters etc), also contains the set of instructions for operating all these tools: both when to run them and what settings to apply. It is the driving force of the process.

In my view the easiest way to adapt both the specialized machines and the operating instructions is simply to edit them directly. (You can always get another copy of zonglist.js if need be).

The same applies to index.html in fact - not only for creating templates, but also for naming lists and making any other modifications to structure or labelling that are required.

I will give some more practical information on the process of editing and adapting below.

Data Files

You may have noticed that there are two other JavaScript files that come with the Tima dictionary besides those already mentioned. These are 'zonglist.lexicons.js' and 'zonglist.lists.js'. The names suggest they have to do with the dataset, and in fact they are the data. One contains the complete lexicon, the other the lists used for accessing it.

Up to now I have stated that the data is stored in JSON format then transformed to HTML, but that is not quite true - in fact it is stored as plain JavaScript.

JSON stands for 'JavaScript Object Notation'. I will be going into the details in the section on data transformation, but for now the point is that JSON is a subset of part of JavaScript. That means that it can be interpreted by JavaScript (and many other programming languages) with hardly any ceremony. This is a large part of its appeal - it makes things fast.

The usual way to include JSON data is to use a 'parse JSON' function that mainly has the job of checking there are no bits of actual programming logic included in the data.

This is for safety - it could be dangerous to load data which includes hidden functions of unknown intent - and it is therefore essential to parse JSON coming from third-party sources, which is a typical scenario for JSON consumption.

In the case when a site is simply loading its own data however there is no real point to this process. If you as a user trust the actual JavaScript files (which by definition contain all sorts of functions) that come with a web page, why would you not also trust the JSON data?

For this reason and because it is slightly easier and faster to simply include the data as one or more real JavaScript objects, I have provided the data as JavaScript files rather than as JSON.

The syntactic difference is minimal however, and it would be simple to change this approach to the more conventional one.

One last thing to note: These files are big, relatively speaking (i.e. for text files), and therefore they are usually supplied minified to save as much space as possible. Should you wish to read them in a text editor however you should be aware that minified files can easily be reformatted programmatically (it is just another aspect of parsing the data).

Other Resources

As with most web sites, there are numerous other resources attached to zongList: fonts, images, sound files. Apart from the fonts, most of these will not be downloaded with the core files if offline installation is used, in order to minimize storage demands as explained already.

Adapting zongList

So far I have made general noises about making changes but without giving much away about how this is achieved. This section covers a number of practical issues that may help you get started if you decide to adapt zongList. We will start with the required development environment (i.e. a web server), then I shall outline a simple procedure that I have found to be effective for making evolutionary changes. Finally we will look at two styles of code you will encounter in zonglist.js and how not to be confused by them. And that will conclude this major section on the presentation framework.

Web Servers

If you want to try adapting zongList, you are going to have to set it up on a test server, preferably one on your own computer. It is not sufficient to simply try to load a local page in a browser with 'File' > 'Open File...'. This may work to an extent for simple sites, but it is not an accurate emulation of how pages are served over a network.

Fortunately, setting up a local server is really not hard to do, and it is free. For Windows machines, I can recommend Uniform Server among others, and there are also good solutions available for MacOS, e.g. MAMP. Linux systems generally include a 'LAMP stack' by default or make it very easy to obtain one.

Actually in addition to a web server you also need a text editor. Anything will do though I strongly advise getting one that supports syntax highlighting for programming languages. Again, you don't have to spend money. A very good, free, cross platform option is jEdit.

Trial and Error

This is my preferred way of working for adapting existing work.

Most code that is made publicly available for re-use includes comments within the source files to give some context, and I have tried to follow this practice in my files. And as mentioned the libraries used with zongList provide additional documentation (see their websites).

But inevitably you will not find straightforward answers to all your questions. In my experience the way to really get a feel for how things work is to look at a feature in the browser, try to identify and examine the related code, make small changes to that code, then reload the file and see what happens.

Let us look at one typical scenario. Since we recently mentioned that templates are defined in the HTML file, and assuming that you are for the time being content with the default framework, then about the only things you would really need to configure in zonglist.js are the keypads and their relationship to the lists.

If we start with a search for the term 'keypads' in zonglist.js we should get three hits.

The first is in the line setting up the keypads-object - it is accompanied by the comment '//object storing keypad settings for each list' (double forward slashes are the way of making a one-line comment in JavaScript). The full name of this object is 'ZONGLIST.keypads'; this just puts the keypads-object under the ZONGLIST 'namespace'.

The second hit finds the meaty bit: the part where individual keypads are defined in this object. Here is an extract:

ZONGLIST.keypads = {
	lx_se_va: {
		layout: "Tima",
		direction: "ltr",
		position: "left"
	},
	dn: {
		layout: "Arabic",
		direction: "rtl",
		position: "mid"
	}
};
  

You can see that the object has a number of properties, whose names also match the lists-object names, e.g. 'lx_se_va' for the list combining head-words (conventionally the 'lx' field in the Toolbox MDF marker set), sub-entries ('se') and variant forms ('va').

The value of each property is another object which contains three properties: 'layout' whose value is the name of the keypad language to use in this case, 'direction', which is a choice of left to right or right to left, and 'position' which sets up the default match position for the search term, all as described earlier.

Before we look at the third hit, let us investigate further what it means to e.g., set the layout property to 'Tima'. This value is matched against the property name in another object, 'ZONGLIST.layouts'. And this object stores the key configuration for each language in a 'array'. Here is an extract of this object showing just Tima:

ZONGLIST.layouts = { 
    Tima: [
        'a|ʌ|b|ɓ|c|d|' + ZONGLIST.kypd.CLOSE,
        'ɛ|e|ɘ|ɨ|f|g|' + ZONGLIST.kypd.CLEAR,
        'h|ɪ|i|j|k|l|' + ZONGLIST.kypd.BACK,
        'm|n|ɲ|ŋ|ɔ|o|' + ZONGLIST.kypd.SPACE_BAR,
        'p|ɽ|r|s|t|t̪|' + ZONGLIST.kypd.WILDONE,
        'ʊ|u|w|y|ʔ||' + ZONGLIST.kypd.WILDSOME 
    ]
};  
  

You can edit this keyboard configuration quite easily, using copy/paste, or whatever key-entry means you are comfortable with, to add any characters you need in any order.

We could go further and investigate how things like 'ZONGLIST.kypd.WILDONE' (the wildcard to match a single character) are defined, but perhaps you get the idea. Careful searching for specific terms can unlock a lot of the secrets.

The third hit for 'keypads' takes us to a very different kind of line:

  keypad_info = ZONGLIST.keypads[val];
  

This is embedded within a function and is an assignment to a variable (keypad_info) of a particular property of the keypads-object, according to the list currently selected by the user. That variable is then used elsewhere in the function.

It sounds a bit involved, but the important point is that this is something that takes place due to user interaction: this is operating the machine, not just setting it up. It is always good to be clear about the difference.

As you no doubt realise by now, files are interconnected, so it is not surprising that some of this configuration, namely the identifiers for the lists, needs to be defined in index.html. The relevant items are the 'value' attributes of the <option> elements contained within the <select> element with an 'id' attribute of "chooseList".

These values are again based on the field names used in your lexicon, and the lists you have made from this dataset. Careful examination of the Tima dictionary's underlying files should help make this clear.

The text values of these elements (i.e. how they appear in the list of lists), and how they are grouped by <optgroup> elements will also need to be considered.

A lot of the changes you will make to both the HTML and JavaScript files will seem quite natural, such as pasting in new character sets to modify existing keyboard layouts. A few will require a bit more care, such as finding the right places to switch from left to right to right to left order.

The main thing to remember is that it is all just text, and that you can always go back to a previous stage. I strongly advise that you make frequent backups of each file you modify.

For testing the outcome in a browser I recommend the Firebug extension for Firefox, though in fact all recent browsers have decent debugging and inspection tools built into them that help you make sense of the code that is being produced beyond simply how it appears on screen.

jQuery vs JavaScript Code

If you took a look at the function referred to above (third hit for 'keypads'), you will have seen first and last lines like this:

  $("#chooseList").change(function () {
  ... 
   }).change();
  
  

This is typical jQuery style. It is a succinct way of saying "find the HTML element with the id 'chooseList' and then when it changes (e.g. due to user interaction) do something...".

Not only is this more convenient to write than the full JavaScript version, but you can be pretty sure that if there are differences in syntax demanded by different browsers, this one jQuery version will cover them all. That is one of the things jQuery is famous for: taking care of browser quirks.

For this reason zongList makes quite extensive use of jQuery solutions. Whenever you see a line beginning with a '$' (the most common alias for jQuery), you are looking at jQuery code and so if you need information on say, a function name (e.g. 'change()' in the example above), refer to jQuery, rather than JavaScript documentation.

Be clear about one thing though: jQuery is, at heart, just JavaScript; it cannot do anything that JavaScript itself cannot manage.

Moreover using jQuery is not always the best solution and so it is not used in all cases where it could be.

For instance jQuery has several functions that would help filter a dataset, but they are not considered to be terribly fast. Since the searches in zongList can be quite complex and the datasets to search quite large and also complex, I decided that a basic but optimised Javascript approach using 'for' loops and embedded 'if' clauses would be more appropriate than a perhaps more succinct jQuery version.

So sometimes the balance can be in favour of readability (read jQuery), but other times performance factors need to take precedence. I mention this in the hope that the different styles of code you will encounter will make more sense.

JavaScript is much more than prehistoric jQuery. You can always try replacing an opaque JavaScript function with a clearer jQuery one if you wish, but I would encourage you to not limit yourself to jQuery's palette.


Preparing the Data

Coming soon...


Acknowledgements

The development of zongList was made possible through the financial support of the Dokumentation bedrohter Sprachen (Documentation of Endangered Languages or 'DoBeS') program of the Volkswagen Stiftung, and through the faith that the Tima project within that program (and most specifically Gertrud Schneider-Blum) placed in me. I am most grateful.

The zongList framework uses several JavaScript code libraries created and generously made available by many clever people. These libraries are: jQuery, jQuery Mobile, jQuery Keypad (jquery.keypad.js), and doT.js. Details of licensing and authorship may be found in the source code for each library. Again, I am most grateful.

If you adapt zongList please retain these acknowledgments.

© Andrew Margetts, September 2013