A phobia of binary format files. It is widespread especially in the Unix community.
Then
To understand how this fear got started you must understand what computers were like in the beginning.
In the early days, there was almost no standardisation. A byte was anywhere from 6 to 12 bits, a word anywhere from 8 to
64 bits. Every machine had at least two proprietary floating point formats. Sometimes each installation defined
its own custom character set. Some machines were big-endian, some little-endian, some twos-complement,
some ones-complement. Reading a file from someone else’s computer was quite an undertaking. It was easier if it
was pure characters because then it was easier to decipher the format.
If a program did not work, since the documentation on the format was typically so sketchy, it was easier to deal with
human-readable character data than binary data, even if it were more bulky.
Data formats were not taken very seriously. Formats were defined procedurally — whatever the program produced.
This sufficed because there was very little interchange of data. If data were exchanged, it would always be read and
written by the same program on the same hardware, so there was no need to define precisely what the format was.
Even mag tape densities and proprietary formats and labels caused interchange problems.
Microsoft used binary formats for its MS Word and Excel products. However they considered the format a proprietary
secret. They would often change the format without telling anyone. They arranged formats to be deliberately incompatible
as a dodge to trick customers into upgrading. Once Version N+1 has touched a document, Version N could no longer read it.
Everyone had to upgrade to Version N+1 at considerable cost, just to be able to read their documents again. Microsoft
only sold version N+1, so there was no legitimate way new users in a shop could stick with version N to avoid the
problem. Microsoft traumatised programmers against binary formats. Programmers gradually decoded and document the
formats as best they could. It was an undertaking comparable to breaking the German enigma code. And there was no
guarantee the result was 100% accurate. Whenever programmers think binary format, they
instantly associate it with Microsoft’s wicked behaviour. In NLP terms, binary format
has become a negative anchor.
CORBA made a brave stab at letting you exchange binary data between different platforms. The
catch was CORBA made such a production of it, that the very thought made programmers want to lie down and take a nap.
Now
Today, things have changed:
- We have converged on IEEE format standards for binary floating point interchange.
- We have standardised on big-endian format for network order.
- Nearly all hardware can read/write 8-bit, 16-bit, 32-bit and 64-bit binary two-complement integers signed and unsigned,
as well as IEEE floating point.
- Java allows serialised objects to be exchanged between machines with totally different internal hardware. Though it
started in Java, there is no reason other languages could not implement the same protocols.
- The Internet means data is now routinely exchanged between computers from different manufacturers, using software from
different vendors on each end. It now becomes extremely important to precisely define the data formats, and to create
programs to verify that the standards are being adhered to.
- The Internet means it is more important than ever to exchange data in compact formats. If you don’t, you waste
bandwidth, air time, computing power, battery life in hand-held devices, and, most of all, people time waiting for
transmissions to complete.
- We are moving to an age with an explosion of hand held devices that communicate the same way cell phones do with the
Internet. These too must be accommodated. They have very tight RAM and CPU requirements. Further, air time is
considerably more expensive and considerably slower than the cable connections that desktop machines enjoy. Further, the
amount of bandwidth is limited by the radio frequency spectrum. We are rapidly running out of cell-phone type bandwidth.
You have to be ultra-efficient to even play the game.
Advantages
There are several major advantages to binary formats:
Compactness
They are compact to store, compact to process in RAM, and compact to transmit over the Internet. In contrast, some text
formats such as XML can be an order of magnitude fluffier.
Speed
A well designed binary format is computer-friendly. The computer can rapidly navigate the data finding what it wants
without having to parse that which it does not want.
Simplicity
Though a binary format might look terrifying to a human viewing it with the wrong tool, such as NOTEPAD, from the
computer’s point of view, it takes much less code to read and analyse a binary format file. This is especially
important in hand-held devices where RAM for code, and battery power to drive that code is at a premium.
Accuracy
If you use text files for information interchange there will be a conversion from binary to prepare them and a
conversion back to binary to read them. Each of those conversions can introduce small errors if you are not careful,
especially with IEEE floating point. If you go direct binary to binary there are two less places you can go wrong.
Symptoms
XML is probably the fluffiest, least efficient text format ever conceived. It is the complete
antithesis of a binary format. Addiction to XML is a symptom of a severe case of binaphobia.
Treatment
The binaphobic wakes in the night terrified he has written a program to create a binary format and now for some reason
he cannot read the data. What can be done to reassure the binaphobic?
- Use industry standard protocols and well tested libraries to read and write the data. Then at worst you will be missing
a field. You are then no worse off than had you done the whole thing in text.
- Remind him, "When was the last time anyone lost a serialised object because of bugs in readObject
or writeObject?"
- Use the proper debugging tools to study your binary format files. You would not use NOTEPAD to modify an MS Word
document, so why do you think it the appropriate tool to examine and edit a binary format document. Use a binary format
editor/inspector. Programmers have no fear of the binary TCP/IP format, because they use proper tools to examine the
bits in the packets, rather than trying to analyse them with NOTEPAD.
- If you invent a new binary format, get different people to write the reader, writer, verifier, and inspector/editor.
That will help iron out inconsistencies or ambiguities in the format specification, and cross check each others’
work. Then get lots of other people to use it. The more people using it, the less likely a bug will slip through
unnoticed.
- Remind him that bugs in properly tested programs are rare. Errors in text files prepared with NOTEPAD are extremely
common. He is like the fool who sits in his car in the garage with the motor running to avoid being hit by lightning.
- Let the binaphobic use a binary editor. He imagines somehow it will be harder to use that Notepad. He imagines that it
will force him to fiddle bits with hex notation. He has no idea that a modern binary editor is like a spreadsheet with
the formulas locked that validates each field as you entered it.