Saving Word Files as HTML

Category: Software

"When I use the 'Save As' feature in Microsoft Word to save a document as a web page, the resulting HTML is a bloated mess. Is all that formatting stuff really needed? If not, is there a way to get rid of it?"


I'd Like to Phone a Friend, Regis

I asked my friend Allan Wyatt, who is an internationally recognized author, software expert, and publisher of the WordTips newsletter to handle this question. Here's what he says:

When Word 2000 creates a Web document, it saves quite a bit of information in the HTML document. This information is Word-specific. It is not necessary for your Web browser, and is only useful if you are planning on loading the HTML document back into Word 2000 at a later date. One element that it records is font sizes. The Web, by default, doesn't support a large number of different font sizes and typographical conventions. It certainly doesn't support as many as Word can. So Word 2000 stores that information in a created HTML document anyway, tucked away so that it can decipher it when you later load up the document in Word.

Some people don't like the way font formatting is done by Word, and prefer to take advantage of the "relative" font sizing that is natural to the Web. The relative font sizing allows the browser--and the user through the browser--to specify the relative size of the text that appears on-screen. This can be a great feature to some people. Word, however, doesn't use the relative font sizing, instead trying to make the font appear as close to what the document author used as possible.

If you are not going to load the document back into Word, you can get rid of all that extra baggage. You can either do this the tedious way, or the somewhat-less-tedious way. The tedious way, of course, involves opening the HTML file in a text editor and removing all but the bare HTML code that is necessary for displaying your information. This requires, of course, that you be fairly conversant in HTML coding.

The somewhat-less-tedious way involves the use of a Microsoft add-in for Word 2000 (called the Office 2000 HTML Filter) that will remove all the Word-specific HTML code for you. The add-in is free; you can learn more about it (and download it) at the following address:

http://www.microsoft.com/downloads/details.aspx?FamilyID=209ADBEE-3FBD-482C-83B0-96FB79B74DED&displaylang=EN

Even after running the Office 2000 HTML Filter, you may still want to open the file and examine to resulting HTML code to make sure it displays information exactly as you intend. While this may require some knowledge of HTML, it doesn't require all the tedious steps of doing the removal and recoding yourself.

Thanks, Allen!



Need more tech support?

Search for help with computers, gadgets, or the Internet!

 

  Search For Tech Help

Send via Email Make a Comment
Follow me on Twitter Buy Bob a Snickers
Save as Favorite Send to Printer

Posted by Bob Rankin on October 5, 2005 12:17 PM


Need More Help? Try the AskBobRankin Updates Newsletter. It's Free!

Prev Article:
History of Linux
Send this article to a friend
The Top Twenty
Next Article:
System Hangs

Link to this article from your site or blog. Just copy and paste from this box:

Related Keywords: Software   word   ms word   save as html   save as web page   convert word to html   convert doc to html   word tips  

Most recent comments on "Saving Word Files as HTML"

Posted by:

Doug Klippert
06 Oct 2005

When a document is saved in HTML format, a sub directory is created to store the graphics. A cleaner solution is to save as MHT (MHTML).
All the graphics will be included in the one file. See http://www.klippert.com/TCC/Blog/2004/10/frontpage-mhtml-one-file-web-pages.html

EDITOR'S NOTE: Great, another non-standard Microsoft standard. So how do you upload that to your web server? Looks like you can't. :-(


Posted by:

George
07 Oct 2005

Nice advice about getting rid of Word html bloat, but I have Office 2003, and there is no advice for that product.

EDITOR'S NOTE: I'd guess that the Word 2000 tool work still work. Give it a try...


Posted by:

Lloyd
07 Oct 2005

Check out this link : http://office.microsoft.com/en-us/assistance/HP030852791033.aspx

It seems that Microsoft have incorporated it directly into Office 2003, no need to install any add-ins.


Posted by:

walter donavan
07 Oct 2005

Word versions later than 2000 can optionally save only the HTML and omit the bloat. No need for an add-in.


Posted by:

Jean-Alex
09 Oct 2005

Using the 2000 HTML filter still produces nonstandard code. You should also run it through Tidy which needs to be configured to catch Word's garbage code.

Tidy can be found at:

http://tidy.sourceforge.net


Posted by:

Allen
12 Oct 2005

Great article. Is there a similar tool for Word 2002 and Word 2003? Or does the Word 2000 tool work on later versions?


Posted by:

Nathan
10 Mar 2009

I please want information on what to do to so as that if I click on a submit button it will send my form instantly.


Post your Comments, Questions or Suggestions

*     *     (* = Required field)

    (Your email address will not be published)
(you may use HTML tags for style)

YES... spelling, punctuation, grammar and proper use of UPPER/lower case are important! And please limit your remarks to 3-4 paragraphs. If you want to see your comment posted, pay attention to these items.

All comments are previewed, and may be edited before posting.

NOTE: Please, post comments on this article ONLY.
If you want to ask a question click here.


Ask Bob Rankin Home Page
RSS   Add to My Yahoo!   Subscribe in NewsGator Online   Feedburner Feed
Subscribe to AskBobRankin Updates: Free Newsletter
Copyright © 2005 - Bob Rankin - All Rights Reserved


Article information: AskBobRankin -- Saving Word Files as HTML (Posted: October 5, 2005 12:17 PM)
Printed from: http://askbobrankin.com/saving_word_files_as_html.html
Copyright © 2005 - Bob Rankin - All Rights Reserved