Skip Navigation / Jump to Content

erpaAdvisory

Answered questions

Return to the ErpaAdvisory index page | View list of unanswered questions

There are currently 21 answered questions on ErpaAdvisory:

Questions 6 to 10 shown below.

Previous | Next

Submitted by query on 27 February 2004 at 13:37

What is the difference between a 'de facto', a 'proprietary' standard, and an internationally recognised one?

Answered by dutched on 27 February 2004 at 13:47

_scope of question_
Your question refers to the notion of “standard”. Though there are various models and technical standards relevant in the field of digital preservation, the answer will not focus on referring to specific standards for specific fields and uses.

_answer_
Your question can be split into three smaller questions:
(1) What generally is a standard?
(2) What is a ‘de jure’ standard, and what is an international standard?
(3) What is a ‘de facto’ standard, and what does ‘proprietary standard’ signify?

(1) There is no single accepted definition for the term ‘standard’. Generally, standards are an agreement between people or parties in order to define a common requirement, and enable a common basis of understanding and cooperation. They are useful to promote reliability, safety, interoperability, efficiency, and cost-effectiveness. Standards will evolve over time or be superseded by others, due to changing requirements or added experience. Standards exist in a range of forms and areas of application, like technical specifications, procedures, models, or frameworks.

(2) Basically it may be distinguished between ‘de jure’ and ‘de facto’ standards. ‘De jure’ standards are formal standards, issued by standards organisations. Members of those organisations need to agree on the standard. Formal standards can be on various scales, from internationally recognised standards, to national and sector-specific, to corporate standards.

An international standard is a ‘de jure’ standard that is internationally agreed upon. The best-known international standards issuing body is ISO, the International Organisation for Standardisation. The “OAIS Reference Model” is an example for such an international standard that spans various communities.

Standards can also be effective on a national scale, where they can even be enforced by law. They may focus on a specific scientific field or business sector; in this case they are issued and maintained by major organisations in this field. For instance, in the pharmaceutical field the International Conference on Harmonisation issued the “Common Technical Document” standard. Furthermore, there are corporate standards and guidelines that are only relevant in the field of influence of the very company.

(3) ‘De facto’ standards spring up in response to a practical need. They are developed and evolve without a formal process and usually without community review. They are widely followed on a voluntary basis because they are deemed a sound concept, for reuse, for interoperability, or just for convenience. Note that this cannot always be explained rationally; sometimes a standard is adopted due to promotion and market penetration despite another, better concept.

‘Proprietary’ standards are ‘de facto’ standards that were developed by and are the intellectual property of private companies. An example is the data format of the software QuarkXPress by Quark, which has become a ‘de facto’ standard in the publishing community. The specification of this data format, however, is not open.


_References_
* Media Union Library: What’s a standard?. Online publication (viewed: 26.2.2003). http://www.lib.umich.edu/ummu/standards/whatis.html

* re:source: Mapping of Standards for Museums, Libraries and Archives. http://www.resource.gov.uk/action/registration/stdsmap00.asp

* Reference Model for an Open Archival Information System; http://wwwclassic.ccsds.org/documents/pdf/CCSDS-650.0-B-1.pdf

* ICH, the Common Technical Document (CTD); http://www.ich.org/ichctd.html

* ISO, International Organization for Standardization; http://www.iso.ch


Submitted by query on 27 February 2004 at 13:36

I am looking for some information and discussion on the pros and cons of storing digital images on a server as opposed to writing to CD. We have a collection of approximately 5500 artworks the collection grows at maximum of 150 works per year.

We will begin a digitisation project in the near future. We intend to allow 80 megabyte per work as the uncompressed, unadjusted capture is about 40 megabyte. We would save this as the original then make an adjusted and cropped copy which would typically be 2/3's the size of the original. A JPG for the database and web access of about 40 kilobyte would be made from this. If a server is the best option then it would also be used to provide space for exhibition designers and general gallery use of up to 150 gigabyte.

Answered by dutched on 27 February 2004 at 13:46

Arguments between online and offline storage include
* accessibility;
* (costs);
* organising the archives;
* knowledge for administration;
* synergies.
They are briefly discussed in the following. Further discussion is necessary in the light of your organisational environment.

Direct accessibility of your artworks is a major advantage of an online system that stores all your data on a live server or a network of servers (cf. SAN - Storage Area Network technology). The importance of accessibility depends on how often you need access to what part of your archives. If you need the master files only sporadically, you may consider storing them offline while keeping the small-size JPG surrogates online on a standard server.
Several years ago, offline data storage on optical media or magnetic tapes was considerably less costly than online storage. In the meantime, however, costs have levelled.
Considerable costs may, however, be incurred for staff and administration. This applies for both online and offline storage: For administrating and maintaining an online data repository your organisation may need external assistance. In addition to setting up a system for offline storage, you have to take the organisational effort for managing your archive into account (your collection will comprise perhaps 700 CDs only for the primary copies).

Your question calls for further reflection on issues pertaining to the preservation of your assets. For ensuring their accessibility in the future, storing a single qualitative original is not sufficient. There should be a backup copy of all your assets; and both copies need to be refreshed and migrated over time to avoid physical deterioration of the data and technology obsolescence.
With those additional tasks your storage requirements rise considerably, and the organisational burden grows as well. While preferably the two copies should be retained on two different systems (e.g. one online, the other on tape) to minimise risks, this may be too costly for your organisation. In any way, you need to validate the copies on a regular basis. If indeed one of the copies is corrupt, you still have the other to recover. (RAID technology could support mirroring your data assets in an online system.)

Furthermore, more arguments may emerge from your organisational environment. Synergies with other organisational needs may, indeed, suggest a specific solution. Your plan to use the storage space for online exhibitions is an example for this.

In all these arguments for storage solutions it is important to consider your choice from a long-term perspective. A slightly higher initial investment may pay off in the long run.


__References__
* TASI Advice: Delivering Digital Images - Digital Preservation and Storage. http://www.tasi.ac.uk/advice/delivering/digital.html

* Michael Reichmann: Archiving Images - Approaches to Storage and Retrieval. In: The Luminous Landscape. http://www.luminous-landscape.com/tutorials/hd-back.shtml

* Thread in the IFLA’s Diglib mailing list: Back-ups for Digital Images. http://infoserv.inist.fr/wwsympa.fcgi/arc/diglib/2003-08/

* Wikipedia: RAID - Redundant array of independent disks. http://www.wikipedia.org/wiki/RAID


Submitted by query on 27 February 2004 at 13:36

We are establishing a digital archive of a considerable size and are considering using data compression. What do we have to consider when choosing the appropriate compression?

Answered by dutched on 27 February 2004 at 13:43

Data compression is applied to reduce storage requirements and overall program execution time. There are numerous compression algorithms with different properties. This answer lists some of the fundamental properties of compression algorithms and subsequently highlights general considerations concerning compression in digital preservation. Based on these the choice of compression algorithms in specific situations can be taken.

+ quick - a compression algorithm takes some time to execute. For a large archive the compression of all its content can take quite some time. At the other end, the compression algorithm needs some time to decrypt the compressed file before the user can access it.

+ efficient - different algorithms generate files of different sizes, depending on the concept they are based on; concepts are, for example, run-length encoding, Huffman encoding, or compression based on discrete cosine transformation as applied by JPEG.

+ tailored to a specific data type - algorithms can be tailored to uses such as text, still images, video, music. By building on the properties of a specific data type a compression algorithm can be more efficient or quicker. A general compression algorithm is, for example, the ‘zip’ algorithm or ‘bzip2’; ‘mp3’ is specific for music, or ‘jpeg’ for still images.

+ lossy/lossless - some compression algorithms entail a loss of information. This results, for example, in the distortion effects on images when the JPEG algorithm is used, or reduced sound quality with MP3 files. Their advantage is that they may execute quicker (as described above) or are possibly more efficient.

+ proprietary - as the word implies, proprietary algorithms are owned by an organisation and their specification is not openly available.


For digital preservation data compression should preferably be avoided altogether. If it needs to be applied in order to save storage space and for the sake of efficiency, a non-proprietary compression algorithm should be applied. The algorithm should preferably be lossless in order to preserve the quality of the object appropriately. Similarly, the other properties as listed above - quick, efficient, and tailored to a specific data type - have to be chosen according to the requirements of the archive and the expected usage of the objects to be compressed. Note, however, that the archival data format and the format for usage do not necessarily need to be the same. Some archives, for example, use an uncompressed TIFF format for archiving images to retain their quality over the long-term and since TIFF is considered to be a relatively stable format, and JPEG for delivery, which is smaller and therefore transferred to the user more quickly.


_references_
* Mark Nelson: http://datacompression.info/ - web hub on data compression. (viewed January 2004)

* Steven W. Smith: The Scientist and Engineer's Guide to Digital Signal Processing. Chapter 27 - Data Compression. 1997; ISBN 0-9660176-3-3.

* Wikipedia Encyclopedia: Data Compression. (viewed January 2004)
http://en.wikipedia.org/wiki/Data_compression


Submitted by query on 27 February 2004 at 13:35

We are about to implement a scanning service for an estimated 3500 users. Scanning will be in-house and up to A3 with the option to microfiche up to A0. We have over 200,000 existing documents and create approximately 20,000 new documents/plans per year.

What are the steps required for the scanning process from the time the document is selected until the image is available to users on-line?

I can find great information on formats and equipment but nothing regarding the process.

Answered by dutched on 27 February 2004 at 13:43

__scope__
You are seeking advice on the process of image digitisation. Any digitisation process will be influenced by the organisation’s requirements for the results of digitisation and will make use of readily available infrastructure. Therefore, the steps listed below are designed to be general and need to be adapted to the specific situation. Moreover, some of the steps may be exchanged in order to attain the most efficient workflow.

In your question you do not refer to tasks and responsibilities necessary to manage the digital images through time. Digital preservation should, however, be considered at the outset of any digitisation project.

In this answer the availability of storage facilities is presumed. Similarly, it is assumed that a digitisation service is employed, or the necessary equipment and staff is available. Installing systems for making the digital images accessible such as servers and web publishing tools are not in the scope of this question.


__answer__
Before embarking on the design of the digitisation process, some fundamentals concerning the purpose and the objectives of your initiative need to be defined.
What are the risks and benefits of digitisation?
Does digitising entail possible damage to the original?
What is the expected use of the digital images?
How long should they be retained?

In the course of answering these questions, you need to consider:
(a) selection criteria for the objects to be scanned,
(b) quality criteria for the digital images,
(c) and the required metadata including documentation.

These considerations are the basis for the following digitisation steps:
- selection of the objects to be digitised
- retrieve object from current archive
- prepare object for scanning (possibly disbind material to assure quality of digitised image);
in case the decision was taken to microfilm the original objects and scan the microfilm, this needs to be taken care of at this point
- ship objects (or microfiche) to location where scanning is done (if necessary)
- scan object
- validate scanned image (quality control is probably only necessary on a sample, depending on the reliability of scanning)
- compile metadata and necessary documentation
- conduct further activities on images as required by specified archival policy and access systems; e.g. convert images to other format, apply automatic character recognition (OCR) and clean-up the result
- (if necessary) ship original material back to archive
- transfer digital images to electronic repository (using offline media such as CDs if no direct data connection is available)
- accession images together with metadata into repository; link digitised objects with finding aids

This framework for an image digitisation process provides a guide for designing the workflow specific to the circumstances. It is important to invest sufficient time in the design of these activities and the specification of the necessary systems and facilities. The need for adaptations at a more advanced stage of a digitisation project will cost considerably more time and effort; from another perspective, any measures to make the workflow as efficient as possible may avoid costs.


__References__

* Making of America 4: Assessing The Costs of Conversion. Handbook, July 2001; The University of Michigan, Digital Library Services.

* AHDS: Digitisation. A Project Planning Checklist.
http://www.ahds.ac.uk:8000/checklist.htm

* ERPANET: Digitisation, Conservation, and Preservation. Report of the ERPANET workshop in Toledo, June 23rd-25th 2002.


Submitted by dutched on 20 June 2003 at 14:20

A large amount of the holdings in our audio archive is in analogue format. We are planning to digitise them in order to facilitate public access via the internet for all our clients. Even though we will conserve the analogue original, we intend to engage in the long-term preservation of the digital copy. Which audio format should we choose in order to facilitate long-term preservation of an audio file?

Answered by swissed on 24 June 2003 at 11:34

_Scope_
Your question focuses on digital audio formats suitable for long-term preservation. While you continue to conserve the analogue originals, you intend to put an effort in the preservation of the digital copies, also. The answer will not address the digitisation of your holdings or the management of the digital resources over time. However, even if a relatively stable, standard format is chosen for preservation, measures must be taken to facilitate the migration of the archive to new forms once they are in danger of becoming obsolete.

_Answer_
There is a myriad of different digital audio formats available. Generally speaking, proprietary formats like WMA are not apt for long-term preservation. Some audio formats are geared at a specific task; for instance DSP or GSM are designed to store speech. When choosing a data format for the archive, the format must be suitable for the audio holdings. Currently MP3 is the most prevalent format. However, it compresses the audio data significantly and thereby looses information. While the rate of compression is customisable for MP3, a format that utilises ‘lossy’ compression is generally not used for qualitative long-term preservation.

Many of the available audio formats are not suitable for long-term preservation. The format that is used for long-term preservation is the WAV format. It does not compress its audio data and thus no information is lost. The downside of this is that WAV files are large files; still, a loss of quality may not be acceptable for long-term preservation. Since WAV is an open format and widely applied, it can be expected to be a relatively stable format and prevail a number of years. One has to take care not to apply one of the derivates of the WAV format, which compress their audio data applying algorithms that entail loss of information, such as the CCITT wav format. In practice, archives use the WAV format for capturing and preserving digital audio material and use comparably smaller MP3 files for delivery due to the popularity of the format.

By customising the properties of the audio stream it is possible to control the audio quality and the size of the file to some extent. Naturally, if the sample rate or the resolution of the audio data is turned down, some information is lost. In many cases, however, it is not useful to choose particularly high settings. When digitising audio from low quality media, for example, the audio media might just not offer more information. Also for retaining human speech settings can be set to quite a low quality level. Even for music recording with 44,100 samples per second (44.1 kHz) and a 16-bit sample size is sufficient in most cases, since human beings can only resolve that many bits per sample and cannot hear pitches above 20 kHz, which is half the sampling rate.

_References_
* The Digital Audio Working Group of the Colorado Digitization Project. Digital Audio Guidelines. 2002. http://coloradodigital.coalliance.org/digaudio1.pdf

* Dietrich Schüller. Principles and Practices for the Transfer of Analog Audio Documents into the Digital Domain. Journal of the Audio Engineering Society V. 49, 7/8; 2001.


Previous | Next