CyTerm iteration plan
Author: Chris
Turner, www.cycom.co.uk
Plan
Phase 1
A web site www.cyterm.com will be created.
The vision, business case and other management documents (including
this one) will be published to attract potential clients. On-line registration
forms will allow clients to join the project. Cycom is registered under
the Data Protection Act to hold this client data. A simple downloadable
personal termbank application (written in java 1.2) will be the first offering.
Clients will be able to download app but will also need jre1.2. Their subscription
will be debited from an account, no need for money to change hands on the
understanding that they must contribute terms for sale and the terms will
pass to public domain if their account is not settled.
Clients paying in advance may request a CDROM with jre1.2 + the app.
Termbanks must be backed up (encrypted) at Cycom with an identity and
password. Cycom will not be able to access the terms without the identity
and password.
There is no sharing at this stage so management info can be omitted.
Term data is author, subject, conceptid, terms, languages, definitions,
context, explanation, sources, project subset, usage, grammar, term type
Software will have an "update software", "backup to cycom", "restore
from cycom" menu item.
Search is free text. Software language is english. Import/Export format
is subset (maybe TMX) of ISO draft standard X-MARTIF XML and html. Input
languages are ISO 8859-1 only.
Phase 2
Relations, and all other data categories from the CLS framework will
be supported by the termbank software. Import/Export will be full blind
MARTIF XML. Input from many more character sets are supported but probably
not the hard ones (e.g chinese, arabic). Software is localised to several
western locales.
Termbanks will be stored and advertised on-line and will be downloadable
via an identity and password. The number of terms, subject area, author
name, credentials, languages and bank/term subscription price (in cytokens)
for each bank will be published. All members will be granted an overdraft
facility of 30 cytokens. A cytoken does not have a defined monetary value
at present but should be estimated at 1 Euro. Groups of termbanks may be
identified by an identity and password and may be searched on-line for
a particular term component.
An on-line ordering system will be provided with clients virtual cytoken
accounts being credited and debited by Cycom. A open market for cytokens
will be established where members can exchange cytokens for cash if needed.
Phase 3
The terms in individual author's termbanks will be exported to a central
termbank, validated and merged with terms from other authors for the same
concept. A single concept entry may now contain terms for many languages
contributed by many authors. The software now supports access control and
versioning at the term level of granularity. Terms can be updated by many
authors without loss of data. Read-only extracts from the central termbank
can be produced for particular subjects and languages and authors but these
are no longer master copies. There is enough audit information to permit
the licence fees paid by a user to be shared between term contributors.
The software supports chinese, arabic, japanese and other difficult input
methods. The software is localised in these languages as well. Export/import
formats include the LISA TBX standard. Web access to terms is now possible
on a pay per term basis and may be open to non-members. Fuzzy searching
is supported.
Phase 4
A translation memory software is released, linking into the termbank. Import/Export
in TMX format. Interface to word processor by cut and paste.
Phase 5
Document parsing/tagging software can recognise some document markup formats
(eg rtf, html) and has some knowledge of grammar (e.g english grammar).
Terms and translation units can be identified automatically. This can be
used to research terms, populate translation memories, and perform rough
machine translation of some documents. The translation memory interface
to word processors will be more streamlined.
Phase 6
Translation memories can be marked up to show grammar and can be linked
to grammar rules. Machine translation gets better and translation memory
matching becomes more intelligent.
Phase 7
Software agents can scan source documents and automatically update or schedule
parts for retranslation of target documents. Document workflows are automated.
Distributed editing of documents is supported with locking, mirroring and
auditing to ensure no loss of data.
Members skills are advertised and a multi-party consortium can be quickly
assembled via on-line blackboards to bid for specific projects. A virtual
exchange for document projects will be created.
Summary
The iterations focus on terminology collection as a basic enabling technology
and build this up from the word to the sentence and finally document level
of collection. The iterations start by ensuring shared meanings and understanding
and end by permiting real time collaboration in a document engineering
project. The economic needs of the actors are recognised at each iteration
and it is hoped that members will experience continuing benefit from continuing
membership.
Disclaimer
The deliverables and completion dates are subject to revision in the light
of experience gained during the project and feedback from members.