Author Topic: Wordprocessor Overhead  (Read 1140 times)

Wordprocessor Overhead
« on: 31 August, 2008, 02:07:30 pm »
I've just done a little experiment.

I'm working with AbiWord, a nice little word processor under Linux (Vector Linux), saving documents both in native .abi format and in Microsoft Word .doc format.  The puzzling thing is the size variations.  I know this will have to do with overhead, but why such a discrepancy?

Test file - a file containing one line 'This is a test.'
Saved in AbiWord native format - 2395 bytes.
Saved in Microsoft Word format - 2288 bytes.

Three-page document:
Saved in AbiWord native format - 12,395 bytes.
Saved in Microsoft Word format - 17,884 bytes.

I can't be ar*ed to type out all the words in Word from scratch, but when I look in 'Properties' in the document, it also comes up with 17,884 bytes (9920 characters, with spaces).

AbiWord seems much more efficient than Word.  And it also seems strange that the very small file is even smaller in Word format, but the bigger file is bigger in Word. :-\
 

Jaded

  • The Codfather
  • Formerly known as Jaded
Re: Wordprocessor Overhead
« Reply #1 on: 31 August, 2008, 02:14:16 pm »
Word fills pads out the document with details of deleted text, also which websites you have visited, many of your addresses and other sensitive info.

To be fair I thought Microsoft had filled that little hole. It was a shame when pdf files became more widely used - I stopped being able to discover interesting snippets from competitors documents.
It is simpler than it looks.

Re: Wordprocessor Overhead
« Reply #2 on: 31 August, 2008, 02:17:21 pm »
Word fills pads out the document with details of deleted text, also which websites you have visited, many of your addresses and other sensitive info.

You don't say!  Is that really the reason?

toekneep

  • Its got my name on it.
    • Blog
Re: Wordprocessor Overhead
« Reply #3 on: 31 August, 2008, 02:20:01 pm »
I remember doing a little experiment with Word. I created a document with nothing but a letter x in it and saved it as a web page. I then opened it in a html editor and it contained dozens and dozens of lines of code. I couldn't see the purpose of much of the code (I'm not a web wizard by any means) so I started stripping them out and then refreshing the page. Each time the web page appeared exactly the same, I think I got it down to about four or five lines in the end and it still displayed the x just the same.  ???

Wowbagger

  • Stout dipper
    • Stuff mostly about weather
Re: Wordprocessor Overhead
« Reply #4 on: 31 August, 2008, 02:24:40 pm »
Dez has great experience in trying to make web pages which work in everything else also work in M$ rubbish. I'm certain they do this on purpose to stop pages and documents, which meet international standards, working properly in M$.
Quote from: Dez
It doesn’t matter where you start. Just start.

Re: Wordprocessor Overhead
« Reply #5 on: 31 August, 2008, 02:44:03 pm »
Still doesn't explain to my mind why the small file in Word format contains fewer bytes than the AbiWord format, why the larger file contains more bytes than the AbiWord one, both of these when created by AbiWord under Linux.  I could understand it in Word on a Windoze machine, but not under these circumstances.

You can tell it's a quiet Sunday, can't you!?

Re: Wordprocessor Overhead
« Reply #6 on: 31 August, 2008, 03:24:02 pm »
.abi files can be compressed and, from memory, older Word document formats have no compression.

Most compression algorithms rely on files containing duplication. Small files contain relatively little duplication and can often become slightly larger after compression. Larger files contain much more duplication and will compress to much smaller files.

Cut and paste the string "This is a test." over and over again to make a 3 page document and then compare the file sizes.
"Yes please" said Squirrel "biscuits are our favourite things."

Re: Wordprocessor Overhead
« Reply #7 on: 31 August, 2008, 03:54:11 pm »
Thanks, GB, that's beginning to make sense.  I hadn't thought about compression.