File Formats

Still Image Formats for Now and Into Your Future

1 - A Long-Lasting Format - Most important is an imaging standard which allows a photographer to "fix" an image with crop, color and tonal corrections and so forth, in a master file which can be reliably used in years to come.

JPEG (as of now) has this qualification, hands down. It has proved itself since 1986 when the standard was introduced. Files made now and files made then can all be opened.

2 -Extended Tonal/Detail via Bit Depth - The next most important item is an ever greater bit-depth for images. Bit depth essentially means finer and finer (i.e. more) gradations of tone and detail allowing ever better adjustments.

No image format has extended bit-depth while also working as a long-lasting format.

No matter how finely the tonal scale is graduated and fine detail is improved, unless the image can be reliably opened in the distant future, and look the same as it did when last adjusted, the image will be lost.

Result: You must, (for now) make sure that you have "masters" of your chosen photos in high-quality JPEGs. If you want further backup, save in 16-bit TIFF (you can probably count on it, but no certainty). Saving in a RAW format is also okay if you are working immediately and batch processing to save all of these in JPG and saving adjusted picks in JPEG and TIF. Just don't count on any RAW format lasting or being available to you in the future.

DNG (digital negative) is a proposal by Adobe to solve the problem with a new standard for a RAW format which would not change from model to model. So far it is barely supported by anyone. Leica is the only company offering DNG as a file format in camera.

 

JPEG (JPG) - (Joint Photographic Experts Group)

This is an international imaging standard for photographs. JPGs offer compressed formats allowing a smaller file size making them easier to store and to send via email or to upload/download on the internet. (http://www.jpeg.org/jpeg) The JPEG standard was created in 1986 after three years of work.

JPEG_DCT encoding - ( The Discrete Cosine Transform ( DCT ) ) Compression encoding generally used for full color and grayscale continuous-tone pictorial images; does not work well with bitonal or palette-color images. Compression is variable and governed by a number of parameters; typical settings provide from 10:1 to 20:1 reductions in file size. The ISO/IEC standard covers both lossy and lossless images.

JPEG retains decent luminance (detail) information but highly compresses color information which can cause gradation and skin tone problems, unless saved to the highest quality.

JPEG comes to you most commonly as 8-bit files which means that you can have only 256 levels of red, 256 levels of blue and 256 levels of green at any pixel location. The smallest of the RAW formats are 12-bit formats, 4096 levels for each color. There is also a 12-bit JPEG specification (the same 4096 levels), used mostly in medical imaging systems. For years 12-bit JPEG has been part of the JPEG specification (in medical imaging). However, 12-bit JPEG is not widely supported. Photoshop supports 1-bit, 8-bit, 16-bit and 32-bit rgb files, but supports only 8-bit JPEGs.

All digital cameras shoot with more bits than are delivered as JPEGs. Basically the camera processes its own RAW data, throwing away data to get down to 8-bit images.

The in-camera conversion software is getting much better, including toe and shoulders for increased shadow and highlight detail in the final JPEG file. Although many applications are available to process RAW files, the in-camera software is constantly improving, leapfrogging the separate applications. Expanding the highlight and shadow areas was originally the province of non-camera software, now it is a part of in-camera software.

http://www.acm.org/crossroads/xrds6-3/sahaimgcoding.html
http://en.wikipedia.org/wiki/Jpeg

TIFF (TIF) - (Tagged Image File Format)

This was considered a main workhorse and a desirable format for photo files. TIFFs are uncompressed and much larger when stored. But TIFF reliability has been dealt blows because of non-standard utilization by various vendors. One of the main problems is that TIFF was never an imaging standard in terms of any international standards body. It came from a commercial company (Aldus) who defined the specs for TIFF files.

JPEG 2000

Despite a name similar to JPEG, the new JPEG 2000 is a totally new image format, using wavelet image encoding/compression instead of DCT encoding as used by JPEG.

Very unlikely to be used for on-camera encoding for a long time, for a number of reasons:

  • Wavelet space encoding style not ideal for on-camera sensor encoding
  • Very complex to implement in fast / low power hardware
  • Patent risks associated with any new format.

The core JPEG 2000 standard unfortunately does not define a standard Meta format, although it does define a way to add new information to a JPEG 2000 file. JPEG 2000 is finding use in Digital Cinema and this may be its home for now.

The following paragraphs are quoted from:
http://www.acm.org/crossroads/xrds6-3/sahaimgcoding.html
When this talks about "blocking" an image it is talking about dealing with the image in small square areas of pixels, each of which forms a "block." (this is not about restricting something)

What is a Wavelet Transform ?

Wavelets are functions defined over a finite interval and having an average value  of zero. The basic idea of the wavelet transform is to represent any arbitrary function (t) as a superposition of a set of such wavelets or basis functions. These basis functions or baby wavelets are obtained from a single prototype wavelet called the mother wavelet, by dilations or contractions (scaling) and translations (shifts). The Discrete Wavelet Transform of a finite length signal x(n) having N components, for example, is expressed by an N x N matrix.

Why Wavelet-based Compression?

Despite all the advantages of JPEG compression schemes based on DCT (simplicity, satisfactory performance, and availability of special purpose hardware for implementation) these are not without their shortcomings. Since the input image needs to be ``blocked,'' correlation across the block boundaries is not eliminated. This results in noticeable and annoying ``blocking artifacts'' particularly at low bit rates.

Lapped Orthogonal Transforms (LOT) attempt to solve this problem by using smoothly overlapping blocks. Although blocking effects are reduced in LOT compressed images, increased computational complexity of such algorithms do not justify wide replacement of DCT by LOT.

Over the past several years, the wavelet transform has gained widespread acceptance in signal processing in general, and in image compression research in particular. In many applications wavelet-based schemes (also referred as sub-band coding) outperform other coding schemes like the one based on DCT. Since there is no need to block the input image and its basis functions have variable length, wavelet coding schemes at higher compression avoid blocking artifacts.

Wavelet-based coding is more robust under transmission and decoding errors, and also facilitates progressive transmission of images. In addition, they are better matched to the HVS characteristics. Because of their inherent multiresolution nature, wavelet coding schemes are especially suitable for applications where scalability and tolerable degradation are important.

Face processing: advanced modeling and methods - Google Books Result (link only work when connected to internet)

GIF - for graphics with solid colors (no photographs)

  
Two GIF images - note that the white in the images is also handled as a color, in this case, as a solid area of a single color (white)

Although early images on the web, including photographs, were mostly GIFs (pronounced jif - like the peanut butter), GIFs are poorly suited for photographs and tend to make files which are much too large, compared to JPEGs. GIFs are perfectly suited to graphics with area of solid color. They compress greatly in size because a GIF pixel can state a color value and state that the color value continues for x-number of pixels. So until the color changes, every pixel can be generated from a single pixel.

See This JPG/GIF comparison of JPEG and GIF formats in terms of file sizes and uses.

PNG - Portable Network Graphics

PNGs were designed as a replacement standard combining features of GIF, JPG and TIF files. This was created at a time when GIFs were under criticism because the UNISYS company, which owned the rights to GIFs, were about to make everyone pay a heavy premium for their creation and use. Although PNGs are used today, especially for images needing semi-transparency, PNGs never really caught on as intended. The prime players remain JPGs and GIFs, in particular, on the web.

DNG - Digital NeGative - See RAW section below

An attempt to solve the main problems with RAW formats by standardizing a RAW file format. This is proposed by Adobe, not by an international standards body and so it has the same potential problems as TIFF. However, Adobe really has a stake in getting all the camera makers to use a standardized RAW format, if only to make it easier to keep engineering their RAW file reader and manipulator, Adobe Camera RAW.

Photographers have a stake in this because it offers one of the few real proposals to address the issues of access to extended data (all that the camera's sensors produce) and still have a format which will be usable down the years.

So far very few camera makers are jumping on DNG. Certainly neither Nikon nor Canon. Leica, are using it in their new cameras with new sensors made by Kodak (Rochester, NY, USA).

Problems have to do with utilization and with the idea of standardization on any format which incorporates a file type for the type of camera information which can show how the camera works and in the process reveal too many secrets to rivals.

The hope is that one of these will standardize a file format with greater bit-depth than we have now.

Photo Formats to Use in a Web page

Always use JPEG (JPG) for photographs
Only use GIF for solid-color graphic images.

Basically, never use BMP, TIF, PCX or even PNG. Below are links to example pages comparing JPG and GIF and to image sizes for your web pages. Although you will find PNGs used in web pages and they are still around they never really caught on.

Deciding which format: PicturesForWeb_JPG_GIF.asp
How and When/Where to set image sizes: PicturesForWeb_Sizing.asp

 


 

RAW = .NEF, .CR2, .RAW, etc

The entire last section of this page is devoted to RAW files. This is because they represent:
1) an emperor's clothes sales job (as is)
2) a good idea which needs to be standardized which gives extended control and nuance to photographs
3) Has been so poorly implemented and so well sold that it endangers every work you trust to it

Main Point: What is wanted is a file structure supported as an imaging standard (so that everyone uses it) which supports greater numbers of tonal levels (more bit depth) than we currently have in standards. Note: There is no reason we couldn't have 14-bit, 16-bit or more-bit JPEGs. There is a 12-bit format used in medical imaging but you can't get to in in Photoshop, the editing standard.

RAW - Uncorrected "original" data from the camera

A few file extensions for RAW format files.
(Note: RAW is always spelled with caps, but is not an abbreviation for anything)
CRW (1st), TIF (2cnd raw but confused with regular TIFF), CR2 (3rd) - Canon RAW formats
NEF (Nikon Electronic Format) - Nikon raw format files (multiple versions, early ones get orphaned)
MRW - Minolta
ORF - Olympus
RAW - Used as a format savable within Photoshop (seems like a pre-DNG idea)
and many dozens of other formats with hundreds of variations

Cameras start with their internal (12-bit or greater) RAW formats to produce 8-bit JPG files. An 8-bit file can differentiate 256 gradation levels for each red and each green and each blue pixel (the three colorants taken together are considered as a 24-bit image, but still an 8-bit color depth). The camera actually records the image with far more than 256 gradation levels. Then its internal software throws away data to adjust the RAW image down to the 8-bit JPG file.

RAW files are touted as being able to allow corrections to the image beyond anything from JPGs and TIFs. That is supposed to include overexposures and underexposures which can be corrected later (in post). This doesn't really eliminate the need to get the exposure right from the start, but it does allow you to look for otherwise discarded highlights and shadows, withing the wider set of gradation levels. That allows you to add details that were thrown away by the camera in going from 4,096 or more gradation levels directly recorded in the camera to the 256 gradation levels of JPEG files for output.

My own conclusions are that no one really wants RAW. RAW simply has too many problems (see below) from archival problems to versioning problems. What everyone does want is more bit-depth to allow far finer gradations of tone and detail (well beyond the 8-bit JPEGs) and with enough "metadata" to make intelligent corrections to the image later. Metadata include vertical/horizontal flags, white balance recording, lens and lens settings (some camera software corrects for optical abberations).

The smallest RAW formats are 12-bit formats. 12-bits gives 4,096 levels for each color at each pixel (4,096 levels of red, and 4,096 of green and 4,096 of blue). Some cameras record RAW at 14-bits (for 16,384 levels) or more (such as 16-bits for 65,536 levels for each colorant).

Note on Bit Depth and Common Usage Terms' Dual-Use Confusion

There are a couple of terms with dual usage which can be a little confusing, even in context, bits, pixels and dots per ...

Bits (and Bytes)

Each bit is a fundamental unit in digital computing. It is either "on" or "off." Bits are considered as a set of bits whose on and off statuses form a pattern which represents numeric and letter values. When 8-bits are grouped together into a unit, their unit is called a "byte." The 8 bits in a byte can be arranged in 256 patterns of "on" bits and "off" bits (from 0 through 255).

In the actual hardware of a computer, each bit is a transistor which acts as a switch, depending on current which is applied to it. So it either passes current or doesn't (on or off). These are grouped together in various numbers at a time. Depending on the way the machine is set up they bit groupings can be 4-bit, 8-bit (called a byte), 12-bit, 14-bit, 16-bit, 32-bit, 64-bit, 128-bit and so forth.

Bit Depth

When we talk about a such-and-such bit image we can be talking about either the number of bits within each colorant which makes up a full pixel or the number of bits in a pixel when the pixel bits are totalled.

For the purposes of this discussion when I refer to bit depth I am talking about the number of bits per each colorant within any pixel. (Each of the colors in a pixel, red, green and blue, when used as incredients in combination to produce another color are refered to as "colorants")

In general usage an 8-bit image is also refered to as a 24-bit image which is simply adding pixels together (8 red + 8 green + 8 blue = 24 overall blended). While the blend gives us a combination which allows more variations, each pixel colorant (red or green or blue) is still limited to 256 possible discrete levels, on its own.

In the same manner, a 12-bit-per-pixel image is also called a 36-bit image (12 rec + 12 green + 12 blue) and a 16-bit-per-pixel image is often referred to as a 48-bit image (16 red + 16 green + 16 blue).

When we are talking about scanners, 24, 36 and 48 bit scans are really 8x3, 12x3 and 16x3.

Pixels

On a computer chip each pixel (picture element) has a colored filter over the top of it which is either red, or green or blue (for a Bayer chip) and each three color (video) camera has a prism arrangement where each chip sees the whole image's red, or green or blue light. So, mechanically, a pixel here is only one tiny photo-transistor which sees one of the primary colors. A pixel can also refer to the smallest discretely assignable graphic area on a printer or imagesetter.

All these pixels are put together, using software in the camera, into a unit called a pixel (again) which contains all three of these colors at each pixel location in the photograph. So, by the time you see them in your photo file a pixel already has three colors.

Dots per inch (dpi) and ppi

On a screen this has no usage because screens (and therefore also web pages) are measured in pixels across the screen (each pixel having all three colorants, red, green and blue).

In printing to paper dots per inch represents a translation of pixels to a physical measurement (inchs) on that sheet of paper. A 600-dot-per-inch picture which is 600 pixels wide will print naturally on paper as 1-inch wide. Not very large, especially since a such a width will take up a lot of screen real estate. The table below is 600 pixels wide.

Dots

Dots can refer to:
1 - halftone dots in a printed publication picture (always dpi) i.e. newspaper, magazine or book
2 - the smallest physical imaging unit in a scanner, camera, printer or imagesetter (ppi = pixels per inch)
3 - the smallest computational imaging unit (for red, green, blue or cyan, magenta, yellow, black) generally refered to as dpi.

Bit Depth Comparison

The images below illustrate the number of levels of gradation which are possible at various bit depths (per each pixel color - red, green or blue). The illustration runs into a practical limit at possible 256 levels. That is the most that your display can reproduce for you on screen so even though it is possible to produce an image file with more levels, they added levels cannot be shown.

Because the display cannot show more than 256 levels, the only way you can see the visual benefit in the image is when software expands the tonal levels in a limited part of the tonal range far enough that the newly expanded range is wide enough to show up in an 8-bit image.

Also, do not expect to see this distribution of tonal values directly in a paper print. An inkjet print is really about a small number of solid colors created by a collusion of your eyes and a distribution of ink dots at various concentrations.

Tonal gradation greater than 256 levels cannot be shown on the page in any way that will let you see the differences. So, to the right, I am just listing the number of tonal gradations which can be distinguished by the sensor and handled by the camera's firmware. 12-bits
=
4096
Values
14-bits
=
16384
Values
16-bits
=
65536
Values


Working with 12-bit or greater bit-depth files allows you to expand the shadow and highlight areas into a steeper curve so that any detail in those areas can be made visible. Otherwise, the 8-bit output doesn't have enough detail in the shallow areas of the density curve (highlights and shadows) to expand into a steeper section.

Most discussion centers around RAW files. This is the wrong framing. What works for photographers is not RAW, but rather, any kind of files with a greater bit-depth than 8-bits. This allows more detail to be drawn out from highlight and shadow areas than would be available with 8-bit files. 8-bit files have only 256 levels of detail. The first step up, 12-bit files already have 4096 levels of detail, which is 16-times the number of tonal levels possible in an 8-bit file. That is the secret to pulling in detail from "washed out" areas.

Output to Printer or Screen

All of this doesn't take into account that both the screen and the printer are greatly limited in the range of tones they can reproduce. The screen at best, can normally display only 8-bit tonal ranges. Inkjet printers and print publications are even worse and really don't have any range. Each ink dot is either fully on or fully off. There are two main kinds of ink dot, stochastic which varies a distribution of same-sized dots, and halftone which varies the size of regularly arranged dots.

A third process, from Fuji uses a laser beam to expose printing paper (Fuji Crystal Archive) to varying intensities of the beam causing various tonal ranges and that paper is then developed. Although much more subtle, the output doesn't necessarily have the wide tonal range of analog enlargements.

What inkjet printers do is to distribute very small, same-size, dots of ink in patterns which are closer together or farther apart depending on the tonal shade of each of the colorants (cyan, magenta, yellow and black). Our eyes and our brains create the "display" which we see as a printed picture with the appearance of a smooth tonal range. The pseudo-random patterns of varying densities are called "stochastic" patterns.

In the case of a publication, such as a newspaper or magazine, halftone dots are used. Halftone dots are arranged in a regular pattern but vary the size of the dots. Our eyes and our brains "see" a tonal appearance by combining the solid dot colors with the amount of space around the dots.

A further addition to half-tone dots, in order to extend tonal range is with duo-tone or tri-tone prints for grayscale images. In this arrangement more than one halftone pattern is printed. Each pattern has ink of different tones.

By the time the picture gets to the printer or the screen, any adjustments in terms of extended tonal range have to be expanded far enough to be seen in an 8-bit display, because we never see the more subtle extended tonal range. In terms of its practicle visual effect that extended tonal range is simply a work space to help us get to a result which resembles a full-range photograph when we look at it.

Why RAW is a raw deal

OR: Great idea with a near fraudulent execution

The camera makers never intended for RAW files to be in the open. These are purely proprietary files which require that any software (or firmware in the camera) needs to have specific knowledge of the file structure in order to get image-processing information.

Unfortunately for photographers, who have wanted access to the greater bit depth the camera starts with before it processes the image into an 8-bit JPEG, there is no imaging standard for open access to RAW files. None.

Instead, these files are so camera specific that even the next firmware tweak of the same camera model can (and has) added structural components or memory locations known only to the camera engineers. Thereby making the file un-readable, even by the camera manufacturer's own RAW editing software without an update.

This item copied from Adobe's page as reasons for upgrading to Photoshop CS4 helps to make my point about why RAW is such a raw deal. Just the idea of "support for more than 190 camera models" may sound impressive, if you haven't lived through this before. Talking about a piece of software supporting n-number of pieces of hardware is going backward by decades.

In the "old" days (which we seem eager to repeat) you bought software and hardware at least partly based on what hardware was supported by various drivers. This was a huge headache to support.

Something was always incompatible. Word processors could be used with only so many printers, and so forth. Later, each piece of hardware had to conform to, and provide drivers for, the operating system (UNIX, Mac, Windows, etc). There is no reason that

Whoever wrote this for Adobe seems to think this is a good idea. Maybe they just don't remember or are too young to know. This is a bad old idea whose time should never have returned. In the even older days, entire operating systems were written for each new computer and any printers or other peripherals they may have had. When IBM introduced their "Systems ..." computers, such as their Systems 360, they introduced the idea of mulitple peripherals which worked interchangeably with the main computer and computers which used standard operating systems between them. Today's RAW is like the pre-systems concept in IBM. We are regressing more than 50 years to get the "advances" of RAW.

Better raw image processing

Enjoy superior conversion quality as you process raw images with the industry-leading Adobe Photoshop Camera Raw 5 plug-in, which now offers localized corrections, post-crop vignetting, TIFF and JPEG processing, and support for more than 190 camera models.

Because RAW file structure not only contains image information but also specific technology information (firmware and hardware), these are not merely proprietary (known only to the makers) but also often protected or encrypted. Clearly, from the way the data is recorded, these files are still not intended to meet any standard other than time-of-manufacture convenience. This is the kind of utilitarian software which is tacked together for in-house use, as the machine is designed.


Un-corrected version

Grayscale version


Corrected Version from JPEG
An un-corrupted RAW was not available.

Because I shot in RAW+JPEG mode the camera recorded two image files every time I clicked the shutter button, one 8-bit JPEG file and one 12-bit NEF file (Nikon native signal processing is 12-bit or 4,096 levels of gradation for each color in each pixel).


The same image from a corrupted NEF file from a Nikon D5000 camera (here as a JPG version)

The truly irritating part here is that this was transfered from memory card to disk in Nikon's own Transfer application.

Nikon camera
Nikon RAW format (NEF)
Nikon copy program
... and it still messed up the Nikon image.

The image was corrupted by the transfer (a file copy operation which doesn't just copy the file as is but which inserts information into the file as it copies, thereby changing it, and so corrupting it).

If I had only the NEF (RAW) file on the card, this is all I would have gotten, because I re-formatted the memory card, thinking the RAW files had safely been copied to the hard drive. Some photographers are so sold on RAW that they do shoot entirely in RAW. The entire set of files would have been un-usable. Luckily I shot in the JPEG+NEF mode so that I was covered.

In other words, Nikon, in bringing out a new camera did not support even a file copy routine for their NEF format as used in the D5000. (Camera model dependency)

Just as bad, there was no warning from Nikon's own program that it did not understand this particular NEF format. Any kind of robust programming would have at least warned me of problems understanding the format, even better if it had not changed the file as it copied.

Further, Nikon's newer software often doesn't even read older NEF files, orphaning them. Nikon is not the only company to have such incompatible file changes. I just happen to have access to the Nikon files.

In any case, this is just poor, thoughtless and lazy, programming.

Robust Programming Would Do Much Better

There is no good excuse for creating a file copy program which cannot copy a later version of an image (or any other) file without corrupting it, ruining it for future use. This is the kind of this which happens in a small shop for some local program not intended to go anywhere or get used in wide distribution.

An earlier program may not be expected to recognize all the contents of newer formats. Even so, the program should be able to ignore any content it doesn't understand, and simply pass it on without changes. This is at the heart of "robust" programming, as used for decades.

DXF (text version) files make a good example because it is easy to see tags with their associated content and it is easy to see structure. Below is part of a text version of a DXF (AutoCAD) file. DXF files were introduced in 1982 by AutoDesk for their AutoCAD program.

Structure in an AutoCAD drawing file is grouped into various sections:

  • HEADER section – General information about the drawing.
  • CLASSES section – Application-defined (i.e. proprietary third-party software) items. These are not published standards and are recognized only by the company inventing them.
  • TABLES section –Definitions of named items.
  • BLOCKS section – Blocks could be described as sub-drawings. Used mostly for parts of a drawing which get repeated a lot, for example, a student desk block for a school building plan.
  • ENTITIES section – Drawing entities (example below), including any Block References (each time one of the blocks gets drawn), .
  • OBJECTS section – Data for nongraphical objects. For programming code (such as code written in AutoLISP) and applications. Here is another area the third party programmers can use to insert data that no one else (including AutoDesk) knows what to do with.
  • THUMBNAILIMAGE section – Preview image (if any) for the DXF file.
  • END OF FILE

The information within each section of a DXF file is identified within the file by tags. Tags in a DXF file work in pairs, each pair consisting of a tag line and a content line. On one line is a tag and on the next line is the content. This repeats for the entire file.

The Tag pairs are the smallest type of element in a DXF file. These files are

For example here is a single item (termed an "entity") from a DXF file. A vertex is a node on a polyline:
0
VERTEX
8
BODY01
10
-50.056896
20
-121.051567
30
28.902149
70
192

Here is what the lines mean. They are read in pairs with the first line telling the computer program what kind of data to expect. Anything not understood by the program reading the file is simply ignored.
0 tag of 0 starts an entity
VERTEX The type of entity - could also have been "CIRCLE" or "ELLIPSE" or "LINE" or "TEXT" or any number of other possible designations
8 tag of 8 is the layer name for this entity
BODY01 this vertex is drawn on the "BODY01" layer
10 tag of 10 is for an X dimension
-50.056896 this vertex is at -50 in the X (left/right) direction
20 tag of 20 is for a Y dimension
-121.051567 this vertex is at -121 in the Y (up/down) direction
30 tag of 30 is for a Z (forward/backward) dimension
28.902149 this vertex is at 29 in the Z direction
70

tag of 70 - is number formed by one or more of the following numbers which can be combined to form any value from 0 to 255.

Vertex flags:
1 = Extra vertex created by curve-fitting
2 = Curve-fit tangent defined for this vertex. A curve-fit tangent direction of 0 may be omitted
from DXF output but is significant if this bit is set
4 = Not used
8 = Spline vertex created by spline-fitting
16 = Spline frame control point
32 = 3D polyline vertex
64 = 3D polygon mesh
128 = Polyface mesh vertex

192 combines 64 and 128

 

To understand why the RAW architecture is so poor and so lazy let's take the content of the entity example above and show it lumped together, without any tags:
VERTEXBODY01-50.056896-121.05156728.902149192

In order to read something like this you have to know exactly where each piece of information starts, where it stops and what it is. That is knowledge which has to be written specifically into the file-reading program which will use it.

Any changes at all in the format and the program which reads it is hopelessly off with no way of figuring it out from looking at the file structure because there are no notes (i.e. no tags) to clue you in.

VERTEXBODY01DAY-50.056896-121.05156728.90214918.345192

Now, I've added some data to the line from above. If I use the existing program, written for the earlier format, nothing will be read correctly.

1 - Not until someone provides the specifications for this file version, will anyone be able to write a program which reads the new version.

2 - A program written for the new specs might not be written to read an earlier version of this type of file meaning that a new of the file reader will "orphan" the old files.

This is probably due to any or all of these:
1 - new programmers or new company writing the software
2 - no old files available to programmers
3 - didn't write each reader in such a way that they are modules in the code which can be switched in or out, depending on file version
4 - the programmers wrote this as if for in-house use only

Here is where a tagged file structure is so versatile. The same file-reading core of the program can remain unchanged for each version of the file. The only changes in processing will be (for the most part) in adding a little code for interpretation as you add new types of tags.

1 - Keep the old parts and you are automatically backward compatible.
2 - Add the new parts so that you are able to do something with the new tags.
3 - If the software doesn't change, you can't do anything with the new tags but you won't corrupt the file either because you will just ignore the information.

Just using tags assures you of a system in which the basic types of information, such as color levels, is readily identified without having to know exactly where in the file these items are located.

The point of the example is not to educate you in the details of DXF files. That is way more than you want to know. This is to make a point about how the RAW files should have been structured and therefore why RAW files are junk programming for otherwise rich data.

Each tag in a DXF file has a meaning. But, not all tags in a DXF file have to make sense to everyone, not even to AutoCAD itself. The file is just a container. The program which reads it is an interpreter.

Data can be added by third-party software developers which is understood only by themselves. No one else knows what to do with it but no one else messes with it. AutoCAD and everyone else just past it on with the file. No changes. That is robust engineering.

RAW files should have been written the same way. That way no copy program or any other program opening a RAW file would be able to corrupt it. They may not know what to do with some of the information, but they should not have to care.

The files should have tags to recognize individual image items, The X and Y locations, the R, G, B amounts and white balance settings would be the most basic. All the other proprietary information could stay and not be recognized or it could not be saved.

RAW is just lazy programming, meaning the software engineers didn't bother to look ahead and come up with a tagging scheme which would be expandable and readily useable by others. Considering the public release of the files, that is really almost malpractice.

   

New Thing is Better Department - A story from the early 1970s

References:
US Government Archives:
http://www.archives.gov/preservation/storage/negatives-transparencies.html
New York State Court System Archive Policy
http://www.courts.state.ny.us/admin/recordsmanagement/policies/Policy12.pdf

Damage from the First Negative Storage Pages Made of Plastic

There is a lot of hype about RAW and a lot of pro shooters are sold on this, even to the point of heated arguments. That is only if we assume the only viable choices are 8-bit files or RAW files. A better solution is to use any standard imaging format with more than 8-bits of gradation levels.

The popularity of RAW file usage reminds me of a loss of thousands of negatives in the early 70s. I lost tons of negatives and so did thousands of early-adaptor photographers like myself.

New negative storage pages made of plastic came onto the market at that time. Until then we stored all negatives in glassine envelopes or brown Kraft envelopes - which were well proved as archival storage means. However, to make contact prints (proof sheets) we had to remove 5 or 6 strips of negatives, or 4 sheets of 4x5-inch negs or 3 or 4 strips of 120/220 negatives, place them in a contact printing frame with a piece of print paper and then expose the frame to enlarger light. They always gathered dust and were subject to scratching.

The new plastic pages were transparent, which meant we could keep the negatives in the pages and just lay the transparent plastic pages with the negs in them on the printing paper, weight it down with a piece of plate glass and expose it to enlarger light. We didn't have to take the negs out, therefore they could remain protected from dust and scratches. We didn't realize the new method had its own special hazard.

What we didn't know was that the plastic was leaching plasticizer from the plastic on to the negatives. The plasticizer is the chemical which allows plastic to be flexible. When plasticizer leached out from the plastic storage pages to the negatives it looked as though water had gotten in between the plastic and the negatives. But when you withdrew the negative, there was no water on the neg. There was something that looked much like water, except that it didn't come off.

The plasticizer ruined or damaged hundreds of my own negatives and overall it ruined thousands of negatives of thousands of photographers. It also led to chemical re-formulation of the plastic pages.

Eventually the replacements were labeled "archival" to make sure we knew these were tested and safe. That is the reason for the "archival" label on plastic negative storage pages today.

Once I realized the damage and why it occured, I removed all my negative strips from the plastic pages and put them back into glassine envelopes, it was years before I tried any of those pages again.

The proprietary problems with RAW files remind me of this problem, to an extreme. Although the files won't be gone, your access to the images and to adjustments you made to the file will be gone because wither you won't be able to open them or you won't have software which will be able to do anything with your RAW archive.

With the plasticizer damage at least we were still able to get some use of the negatives because the plasticizer didn't always mess up the entire negative or the most important parts of the image. But, when RAW files are no longer supported by software or because the camera model changes, you just get nothing or you get nothing usable.

Sometimes RAW just won't open. Sometimes when RAW does open, it looks totally different from what it last looked like.

Nearly 40 years on I still remember that loss of photographic negatives. Now, when I see the types of problems RAW files have, that is all I can think of. And this time, all these early adaptors will have NOTHING. Not even "usable damage."

If you do want the full bit-depth of the camera, you need to use batch processing from your camera RAW program to create a set of 16-bit TIFF files. TIF files still have problems due to non-standard implementations and versions, just not as much as RAW files. If you use Photoshop you should be good. TIF used to be my choice for un-compressed archival storage, until, with later software some of those didn't open. (Caveat photographers!)

Just make sure the your particular RAW format is never, never, never your archive. Always have JPEGs of all of them, and 16-bit TIF for a conditional backup. For the future look for DNG to come into its own, but give it a big wait and see.

References:
These talk about RAW as pretty much the only choices.
The real hunger is for more bit depth, regardless of image format.
   Wikipedia tech essay: http://en.wikipedia.org/wiki/Raw_image_format -
   Ken Rockwell opinion and rant: http://www.kenrockwell.com/tech/raw.htm -
   dcRaw free convertor utility for windows - http://www.wizards.de/rawdrop


Raw Basics (from Rockwell) - I sometimes, but not always, agree with his opinions. He clearly doesn't shoot in the environments I do and some of his recommendations don't fit for dance. We need to get rid of RAW and talk about standardized file structures with more bit depth which is really the primary thing we want from any format.

Ken Rockwell writes:

Raw files are just the raw sensor data. It isn't a picture until it is processed further. Most fancy digital cameras allow you to save the raw data instead of the actual JPG picture. If you do, you still have to do the processing in your computer to make an image (JPG or otherwise) that you actually can see. Cameras do this processing in hardware much faster than your computer can do it in software.

Some cameras have a handy raw + JPG mode which saves both the raw data and the JPG picture.

Raw files are just like raw olives: you need to cook or otherwise process them before you can use them. They also go bad fast if left in the raw state and can keep forever once processed to something like olive oil or JPGs.

Horror of horrors, I've heard that the latest Nikon software can't even read the NEFs from older cameras and that you need to load older software to read them. Just like raw eggs, unless you process it into something like an egg-albumen print or a JPG, the raw files may go bad if left unprocessed.

It's not the file that goes bad, silly, it's the potential ability of future software to read it. Since raw data is entirely unique to each camera, and different even for different firmware revisions for the same camera, raw isn't even a format, even though the different files have the same suffix like .CRW or .NEF.

Read the full piece from Ken Rockwell here: http://www.kenrockwell.com/tech/raw.htm


Canon & Nikon RAW

The RAW formats are largely undocumented, some even encrypted. Some of them use variations on the published TIFF format in parts of their RAW files.

Canon's raw formats in order: First CRW then TIF then CR2. They started with CRW then switched to TIF as an extension. But these TIFFs weren't the Aldus/Adobe TIFF and this caused a lot of confusion. So, (guessing here) they changed their RAW-file extension to CR2.

Nikon uses NEF as their filename extension for their RAW files and they have separate versions for scanners and for camera. The internals are a mix of EXIT format and TIFF format (for headers). Nikon is infamous for constantly changing the format with tweaks and equipment changes making them a challenge, at best, to read. They also include encryption in their NEF files.

For a longer, technical version of this see: http://www.openraw.org/node/1482/index.html


The problems with RAW lie in proprietary formats from each camera maker, as well as constantly changing formats added to RAW image editors which may not keep up with changes in format and camera firmware which gets ahead of RAW formats by adding in-camera processing which is sometimes better not to mention the added work needed to convert RAW to the required JEPG or TIFF files for usage.