fumei
Mon Feb 25 12:17:05 PST 2008
RE: cleaning web text, take a look at:
http://word.mvps.org/faqs/formatting/CleanWebText.htm
fumei wrote:
>The following will ROUGHLY do what you ask.
>
>Option Explicit
>
>Sub TryThis()
>Dim file
>Dim strPath As String
>Dim ThisDoc As Document
>Dim ThatDoc As Document
>Set ThisDoc = ActiveDocument
>strPath = "c:\myfiles\test\"
>file = Dir(strPath & "*.html")
>Do While file <> ""
> Set ThatDoc = Documents.Open(FileName:=strPath & file)
> ThisDoc.Range.Collapse 0
> ThisDoc.Range.InsertAfter (ThatDoc.Range.Text) & _
> vbCrLf
> ThatDoc.Close
> file = Dir
>Loop
>Set ThisDoc = Nothing
>End Sub
>
>It takes all HTML files in the folder c:\myfiles\test and grabs the text from
>each, appending them, one after the other, into the active document.
>
>I say rough, because, there is the big issue of any graphics. They are
>brought in as ASCII character 1. Also, table cells are considered paragraphs
>by Word. So an empty table cell will be considered a separate paragraph.
>
>Is there a robust, clean, way to just get text, and ONLY text? Perhaps, but
>I do not have time to work it out. Hopefully this may get you started.
>
>>I am looking for a script or some advises how to write a script that
>>sequentially opens several (specified) html pages, reads the text and
>>inserts it into a word document, one after the other.
>>
>>A little help from my friends out there, please....
--
Message posted via OfficeKB.com
http://www.officekb.com/Uwe/Forums.aspx/word-programming/200802/1