Re: File size and HTML by R
R
Sun Jul 20 10:55:55 PDT 2008
Hi Bob,
Thanks. I followed your advice and got a file which is 11.975.311
bytes in size, i.e. around 4 MB smaller than the one I had but not
nearly as small as the file made by Word-2000.
Here is a sample of the code I get:
Fragment of the header:
<!--
/* Font Definitions */
@font-face
{font-family:Helvetica;
panose-1:2 11 5 4 2 2 2 2 2 4;}
@font-face
{font-family:Courier;
panose-1:2 7 4 9 2 2 5 2 4 4;}
@font-face
{font-family:"Tms Rmn";
panose-1:2 2 6 3 4 5 5 2 3 4;}
@font-face
{font-family:Helv;
panose-1:2 11 6 4 2 2 2 3 2 4;}
@font-face
{font-family:"New York";
panose-1:2 4 5 3 6 5 6 2 3 4;}
@font-face
{font-family:System;
panose-1:0 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"MS Mincho";
panose-1:2 2 6 9 4 2 5 8 3 4;}
Fragment of the body of the file:
<p class=MsoNormal><b><span lang=PT-BR
style='font-size:9.0pt'>acak</span></b><span
lang=PT-BR style='font-size:9.0pt'>-<b>acakan</b> II bi: ongeordend,
verward,
wanordelijk, rommelig Tw {<i>Sapa wani kandha yèn aku nyambutgawé
acak-acakan?</i>
Tr253}·</span></p>
<p class=MsoNormal><b><span lang=PT-BR
style='font-size:9.0pt'>acak</span></b><span
lang=PT-BR style='font-size:9.0pt'>-<b>acak</b> III Gun: meevragen,
vragen om
mee te komen {\<i>Lha mbah Nan ki yahéné wis acak-acak ki pité
jawané...</i>
Ros3}; zo <i>ajak</i>·</span></p>
<p class=MsoNormal><b><span lang=PT-BR
style='font-size:9.0pt'>acala</span></b><span
lang=PT-BR style='font-size:9.0pt'> bt: berg·</span></p>
<p class=MsoNormal><i><span lang=PT-BR
style='font-size:9.0pt'>ora</span></i><span
lang=PT-BR style='font-size:9.0pt'>, <i>durung</i> <b>acan</b> gw:
helemaal
(nog) niet·</span></p>
As you see, the font definitions take space, but most space is used by
tags in the body of the text. The whole text is 9pt Times, with a few
arrows which are from the Symbol font strewn in between (also 9 pt.)
The language setting also does not change anywhere in the file. (It is
only relevant to the key code setting, I suppose.) So there is no need
at all to repeat it every few words.
The program I use to remove superfluous HTML code is STRIPHTM.EXE,
which I wrote in Stonybrook Modula-2.
If you wish I can mail you a copy.
Kind regards,
Rob, Amsterdam.
On Sun, 20 Jul 2008 09:56:44 -0700, "Bob Buckland ?:-\)"
<75214.226(At Beautiful Downtown)compuserve.com> wrote:
>Hi Rob,
>
>Word 2007's new features (langauge neutral architecture, quick style sets, font pairs in themes...) can put quite a bit of
>information into a Word web document to allow restoring to a .doc,.docX/M file type from a web page.
>
>If you use Office Button=>Save As=>Other File Types=>Web Page-Filtered
>you may see quite a bit of that removed.
>
>What is the DOS utility you're using to filter the HTML output?
>
>=============
> <<"Rob van Albada" <R.vanAlbada2@chello.nl> wrote in message news:48832e84.11386609@msnews.microsoft.com...
>Hi,
>
>I am using Word2007 to edit a rather largish bilingual dictionary.
>When I strip all superfluous HLML-tags, the size is around 6 MB.
>The file produced by Word used to be around 1 MB larger, about 7 MB.
>I use a DOS32 program to strip the file of its superfluous tags for
>advanced processing.
>However, lately, the file size has increased enormously.
>Under Word-2007 (before I used Word-2000) the file size has increased
>from 6 MB to 15.9 MB approx.
>For instance, the header now contains a list of all available fonts
>(several hundred, while I use only two: Times New Roman and Symbol).
>Also, every two or three words the file contains totally superfluous
>information of the font, language and font size.
>How can I bring back the file size to something more normal?
>Word slows down considerably with a file of this size.
>
>Thanks for your help,
>
>Rob in Amsterdam>>
>--
>
>Bob Buckland ?:-)
>MS Office System Products MVP
>
> *Courtesy is not expensive and can pay big dividends*
>
>
>
>
>
>
>