Re: High Performance Programming in MS Word by Word
Word
Sat Jul 10 08:06:42 CDT 2004
G'day Nick <nick@heha.net.tw>,
I was purely talking compiled dll vs interpreted host-based scripting.
Whatever language you choose, it makes little difference to the end
result :-)
Steve Hudson - Word Heretic
Want a hyperlinked index? S/W R&D? See WordHeretic.com
steve from wordheretic.com (Email replies require payment)
Nick reckoned:
>Hi,
>
>Thanks for your reply first.
>
>I think formating is NOT important for me, so I choose method 1:
>
> >From VBA
> > Save the bloody file as text
> > Use DocStats as a rough guide to your word count.
> > Call your C# to go sicko speeds.
> >
> > From C#
> > Serialize word structures as per MS Word (any non-alpha post alpha is
> > a new word start) into a bloody huge array which you can predetermine
> > using the docstats result as a parm.
> >
> > Keep a 'done' list of serialised words worthy of marking. Re-enter the
> > Word document, obtain Document.Content.Words(offset) and mark
> > accordingly.
>
>1. Why C#, wouldn't VC is much faster?
>
>2. How to call the C# from VBA? Write the C# as a component? Sorry as I
>am new to .NET, for example, for VC, should I use ATL instead?
>
>
>Regards,
>Nick
>
>
>Word Heretic wrote:
>
>> G'day Nick <nick@heha.net.tw>,
>>
>> <chuckles> You too huh. It's an interesting area. There are two main
>> methods for you to consider here.
>>
>> Method 1 - Formatting is NOT important to your parse.
>>
>> From VBA
>> Save the bloody file as text
>> Use DocStats as a rough guide to your word count.
>> Call your C# to go sicko speeds.
>>
>> From C#
>> Serialize word structures as per MS Word (any non-alpha post alpha is
>> a new word start) into a bloody huge array which you can predetermine
>> using the docstats result as a parm.
>>
>> Keep a 'done' list of serialised words worthy of marking. Re-enter the
>> Word document, obtain Document.Content.Words(offset) and mark
>> accordingly.
>>
>>
>>
>> Method 2 - Formatting is important
>>
>> For extreme speed, I would probably use a variant of Method 1 that
>> uses a HTML output to parse.
>>
>> OTHERWISE
>>
>> Any C would be only using Word calls anyway - as who wants to rebuild
>> an RTF processor - YUCK! Avoid it, stick with VBA, as you won't be
>> needing interface wrappers for all your calls it is probable it will
>> actually run a bit faster for you from VBA.
>>
>> First up, all the collections are dynamic, so you really want to avoid
>> doing things like .Para(k) as when k gets to 100, Word has to quickly
>> serialise the first 100 paras in the defined range to get your answer.
>>
>> If you move your range start ahead a para at a time and use para 1 its
>> much quicker and automatically delivers doc end when myRange.start is
>> at myRange.end.
>>
>> You will need to know about Range objects, and then start looking at
>> .Paragraphs.Range.Words(n).Text.
>>
>> There's obviously some tricks to getting this running really quick in
>> VBA, I outline numerous performance enhancements in my Word VBA for
>> Beginner's book from my website for a small fee.
>>
>>
>> Steve Hudson - Word Heretic
>> Want a hyperlinked index? S/W R&D? See WordHeretic.com
>>
>> steve from wordheretic.com (Email replies require payment)
>>
>>
>> Nick reckoned:
>>
>>
>>>Hello,
>>>
>>>I am new to MS Word programming, currently, I am planning to do a
>>>project in which aims to
>>>
>>>1. Read every words in a word document and parse it and analyze it using
>>>multiple data mining algorithms (they are very CPU intensive algorithm!)
>>>
>>>2. Bold and highlight the analyzed words in the same document
>>>
>>>I have really no idea where to start with, the main concern is to choose
>>>an efficient method to implement the system.
>>>
>>>After some searching in google, there are some suggestions:
>>>
>>>1. Pure VBA implementation
>>>2. C++/COM + VBA
>>>
>>>Some people said C++/COM + VBA is even slower than pure VBA
>>>implementation. Is it true? I would like to hear more suggestions on
>>>high performance programming in Win Word.
>>>
>>>Thanks
>>>
>>>Nick
>>
>>