I came across an old post in Google groups for deleted duplicated
lines of text in a document.

It used a For x = Count method to go through and check the range of
one paragraph to the the range of every other paragraph and delete any
duplicates.

It had two If ... End If blocks. The first check the para range
length. If = then the second performed a text comparison. I assume
the author thought that it would save time by doing a text comparison
only on paras of equal length.

The procedure worked as advertised, however with a longer document it
took a long time.

I created about 800 paragraphs and determined that it was actually
much quicker to bypass the the first length check and just do a range
comparison on every paragraph. Down from 200 seconds to 75 seconds!

Next I remembered an method that Jezebel showed me for stepping
through items using the .Next (property or method I am never sure
which).
I adapted the code as follows and the time taken was down to 3
seconds!

Anyway, I just wanted to share this with the group:

Sub KillDuplicateParagraphs()
Dim SBar As Boolean
Dim TrkStatus As Boolean
Dim eTime As Single
Dim oParRef As Paragraph
Dim oParChk As Paragraph
eTime = Timer
With ActiveDocument
TrkStatus = .TrackRevisions
.TrackRevisions = False
End With
With Application
SBar = .DisplayStatusBar
.DisplayStatusBar = True
.ScreenUpdating = False
End With
Set oParRef = ActiveDocument.Range.Paragraphs(1)
Set oParChk = oParRef.Next
Do
'*** Stet out first if block to delete duplicated empty paragraphs.
If Len(oParRef.Range.Text) > 1 Then
Do
'An empty last paragraph may throw an error on the last loop.
On Error GoTo Err_Exit
If oParRef.Range = oParChk.Range Then
oParChk.Range.Delete
Else
Set oParChk = oParChk.Next
End If
Loop Until oParChk Is Nothing
End If '***
Set oParRef = oParRef.Next
'Skip errors.
On Error Resume Next
Set oParChk = oParRef.Next
On Error GoTo 0
Loop Until oParRef Is Nothing
Err_Exit:
With Application
.StatusBar = False
.DisplayStatusBar = SBar
.ScreenUpdating = True
End With
ActiveDocument.TrackRevisions = TrkStatus
MsgBox "Finished. Elapsed time: " & (Timer - eTime + 86400) Mod 86400
& " seconds."
End Sub

Re: KillDuplicateParas by Greg

Greg
Fri Mar 09 12:24:33 CST 2007

Oops. I forgot to mention the method used one of the For loops to
update the status bar.

Add:
'Application.StatusBar = ActiveDocument.Paragraphs.Count & "
paragraphs to check. "

Just after the On Error GoTo 0 line.

Also all my test where done with only two or three different
paragraphs repeated many times. As the number of different paragraphs
increases (in any method I suppose) the time to process will increase.



On Mar 9, 1:04 pm, "Greg Maxey" <gma...@gmail.com> wrote:
> I came across an old post in Google groups for deleted duplicated
> lines of text in a document.
>
> It used a For x = Count method to go through and check the range of
> one paragraph to the the range of every other paragraph and delete any
> duplicates.
>
> It had two If ... End If blocks. The first check the para range
> length. If = then the second performed a text comparison. I assume
> the author thought that it would save time by doing a text comparison
> only on paras of equal length.
>
> The procedure worked as advertised, however with a longer document it
> took a long time.
>
> I created about 800 paragraphs and determined that it was actually
> much quicker to bypass the the first length check and just do a range
> comparison on every paragraph. Down from 200 seconds to 75 seconds!
>
> Next I remembered an method that Jezebel showed me for stepping
> through items using the .Next (property or method I am never sure
> which).
> I adapted the code as follows and the time taken was down to 3
> seconds!
>
> Anyway, I just wanted to share this with the group:
>
> Sub KillDuplicateParagraphs()
> Dim SBar As Boolean
> Dim TrkStatus As Boolean
> Dim eTime As Single
> Dim oParRef As Paragraph
> Dim oParChk As Paragraph
> eTime = Timer
> With ActiveDocument
> TrkStatus = .TrackRevisions
> .TrackRevisions = False
> End With
> With Application
> SBar = .DisplayStatusBar
> .DisplayStatusBar = True
> .ScreenUpdating = False
> End With
> Set oParRef = ActiveDocument.Range.Paragraphs(1)
> Set oParChk = oParRef.Next
> Do
> '*** Stet out first if block to delete duplicated empty paragraphs.
> If Len(oParRef.Range.Text) > 1 Then
> Do
> 'An empty last paragraph may throw an error on the last loop.
> On Error GoTo Err_Exit
> If oParRef.Range = oParChk.Range Then
> oParChk.Range.Delete
> Else
> Set oParChk = oParChk.Next
> End If
> Loop Until oParChk Is Nothing
> End If '***
> Set oParRef = oParRef.Next
> 'Skip errors.
> On Error Resume Next
> Set oParChk = oParRef.Next
> On Error GoTo 0
> Loop Until oParRef Is Nothing
> Err_Exit:
> With Application
> .StatusBar = False
> .DisplayStatusBar = SBar
> .ScreenUpdating = True
> End With
> ActiveDocument.TrackRevisions = TrkStatus
> MsgBox "Finished. Elapsed time: " & (Timer - eTime + 86400) Mod 86400
> & " seconds."
> End Sub



Re: KillDuplicateParas by Helmut

Helmut
Fri Mar 09 13:23:01 CST 2007

Hi Submariner,

how about this one:

Sub Makro6x()
Dim t As Single
t = Timer
Dim prg1 As Paragraph
Dim prg2 As Paragraph
For Each prg1 In ActiveDocument.Range.Paragraphs
For Each prg2 In ActiveDocument.Range.Paragraphs
If prg1.Range.Text = prg2.Range.Text Then
If prg1.Range.start <> prg2.Range.start Then
prg2.Range.Delete
End If
End If
Next
Next
MsgBox Timer - t
End Sub

800 paragraphs of kind rand(1,10)
104 pages
3.4 seconds
10 paragraphs left over.

--
Greetings from Bavaria, Germany

Helmut Weber, MVP WordVBA

Win XP, Office 2003
"red.sys" & Chr$(64) & "t-online.de"



Re: KillDuplicateParas by Greg

Greg
Fri Mar 09 14:40:38 CST 2007

Helmet,

That is clear code and fast, but not as fast as 2.714 seconds (maybe
my processors is faster) ;-)

Here is the real difference. Make each of those 800 paragraphs
slighgly different:

Sub ScratchMacro()
Dim i As Long
For i = 1 To ActiveDocument.Paragraphs.Count
ActiveDocument.Range.Paragraphs(i).Range.Characters.First = i
Next i
End Sub

Then delete the numbers from the last two. We now have 799 different
paragraphs and 2 duplicates. Run both codes again:

Your version: 123 seconds
My version: 69 seconds

Both are lightening fast when all the paragraphs are duplicates,
because that first loop only runs once.

I think the speed efficiency in the .Next method is due to the
processor doesn't have to a keep track of the paragraph count. ???

Cheers




On Mar 9, 2:23 pm, Helmut Weber <nbhymsjxd...@mailinator.com> wrote:
> Hi Submariner,
>
> how about this one:
>
> Sub Makro6x()
> Dim t As Single
> t = Timer
> Dim prg1 As Paragraph
> Dim prg2 As Paragraph
> For Each prg1 In ActiveDocument.Range.Paragraphs
> For Each prg2 In ActiveDocument.Range.Paragraphs
> If prg1.Range.Text = prg2.Range.Text Then
> If prg1.Range.start <> prg2.Range.start Then
> prg2.Range.Delete
> End If
> End If
> Next
> Next
> MsgBox Timer - t
> End Sub
>
> 800 paragraphs of kind rand(1,10)
> 104 pages
> 3.4 seconds
> 10 paragraphs left over.
>
> --
> Greetings from Bavaria, Germany
>
> Helmut Weber, MVP WordVBA
>
> Win XP, Office 2003
> "red.sys" & Chr$(64) & "t-online.de"



Re: KillDuplicateParas by Helmut

Helmut
Sat Mar 10 02:47:27 CST 2007

Hi Greg,

I see now that you are talking about the case
that a paragraph is immediatly followed by a duplicate,
whereas my code was meant to remove duplicate paragraphs
wherever they appear.

For removing empty paragraphs from the doc's end,
I use this code:

While ActiveDocument.Characters.Last.Previous = Chr(13)
ActiveDocument.Characters.Last.Delete
Wend

Cheers

--
Greetings from Bavaria, Germany

Helmut Weber, MVP WordVBA

Win XP, Office 2003
"red.sys" & Chr$(64) & "t-online.de"

Re: KillDuplicateParas by macropod

macropod
Sat Mar 10 04:21:14 CST 2007

Hi Greg,

Looks like a derivative of something I developed & posted. My full version is:

Dim SBar As Boolean ' Status Bar flag
Dim TrkStatus As Boolean ' Track Changes flag

Sub KillDuplicateParas()
Call MacroEntry
Dim i As Long, j As Long
Dim eTime As Single
eTime = Timer
With ActiveDocument
If .Paragraphs.Count > 1 Then
' Loop backwards to preserve paragraph count & indexing.
' Start at 2nd-last paragraph.
For i = .Paragraphs.Count - 1 To 1 Step -1
' Ignore empty paragraphs
If Len(.Paragraphs(i).Range.Text) > 1 Then
' Loop backwards to preserve paragraph count & indexing.
' Stop at last preceding paragraph.
For j = .Paragraphs.Count To i + 1 Step -1
' Report progress on Status Bar.
Application.StatusBar = i & " paragraphs to check. "
' No point in checking paragraphs of unequal length.
If Len(.Paragraphs(i).Range) = Len(.Paragraphs(j).Range) Then
' Test strings of paragraphs of equal length.
If .Paragraphs(i).Range = .Paragraphs(j).Range Then
' Delete duplicate paragraph.
.Paragraphs(j).Range.Delete
End If
End If
Next
End If
Next
End If
End With
' Report time taken. Elapsed time calculation allows for execution to extend past midnight.
MsgBox "Finished. Elapsed time: " & (Timer - eTime + 86400) Mod 86400 & " seconds."
Call MacroExit
End Sub

Private Sub MacroEntry()
' Store current Status Bar status, then switch on
SBar = Application.DisplayStatusBar
Application.DisplayStatusBar = True
' Store current Track Changes status, then switch off
With ActiveDocument
TrkStatus = .TrackRevisions
.TrackRevisions = False
End With
' Turn Off Screen Updating
Application.ScreenUpdating = False
End Sub

Private Sub MacroExit()
' Clear the Status Bar
Application.StatusBar = False
' Restore original Status Bar status
Application.DisplayStatusBar = SBar
' Restore original Track Changes status
ActiveDocument.TrackRevisions = TrkStatus
' Restore Screen Updating
Application.ScreenUpdating = True
End Sub

I think they key difference is that my code checks all paras against each other, whereas yours only checks adjacent paras. I might
incorporate Helmut's revisions, though, since they seem to speed things up a bit.

Cheers

--
macropod
[MVP - Microsoft Word]
-------------------------

"Greg Maxey" <gmaxey@gmail.com> wrote in message news:1173463454.081324.32510@s48g2000cws.googlegroups.com...
>I came across an old post in Google groups for deleted duplicated
> lines of text in a document.
>
> It used a For x = Count method to go through and check the range of
> one paragraph to the the range of every other paragraph and delete any
> duplicates.
>
> It had two If ... End If blocks. The first check the para range
> length. If = then the second performed a text comparison. I assume
> the author thought that it would save time by doing a text comparison
> only on paras of equal length.
>
> The procedure worked as advertised, however with a longer document it
> took a long time.
>
> I created about 800 paragraphs and determined that it was actually
> much quicker to bypass the the first length check and just do a range
> comparison on every paragraph. Down from 200 seconds to 75 seconds!
>
> Next I remembered an method that Jezebel showed me for stepping
> through items using the .Next (property or method I am never sure
> which).
> I adapted the code as follows and the time taken was down to 3
> seconds!
>
> Anyway, I just wanted to share this with the group:
>
> Sub KillDuplicateParagraphs()
> Dim SBar As Boolean
> Dim TrkStatus As Boolean
> Dim eTime As Single
> Dim oParRef As Paragraph
> Dim oParChk As Paragraph
> eTime = Timer
> With ActiveDocument
> TrkStatus = .TrackRevisions
> .TrackRevisions = False
> End With
> With Application
> SBar = .DisplayStatusBar
> .DisplayStatusBar = True
> .ScreenUpdating = False
> End With
> Set oParRef = ActiveDocument.Range.Paragraphs(1)
> Set oParChk = oParRef.Next
> Do
> '*** Stet out first if block to delete duplicated empty paragraphs.
> If Len(oParRef.Range.Text) > 1 Then
> Do
> 'An empty last paragraph may throw an error on the last loop.
> On Error GoTo Err_Exit
> If oParRef.Range = oParChk.Range Then
> oParChk.Range.Delete
> Else
> Set oParChk = oParChk.Next
> End If
> Loop Until oParChk Is Nothing
> End If '***
> Set oParRef = oParRef.Next
> 'Skip errors.
> On Error Resume Next
> Set oParChk = oParRef.Next
> On Error GoTo 0
> Loop Until oParRef Is Nothing
> Err_Exit:
> With Application
> .StatusBar = False
> .DisplayStatusBar = SBar
> .ScreenUpdating = True
> End With
> ActiveDocument.TrackRevisions = TrkStatus
> MsgBox "Finished. Elapsed time: " & (Timer - eTime + 86400) Mod 86400
> & " seconds."
> End Sub
>


Re: KillDuplicateParas by Helmut

Helmut
Sat Mar 10 06:10:04 CST 2007

Hi,

hmm...
maybe it is the structure of the data,
but this takes minutes or maybe will run endlessly,
though I can't see a reason for that:

Sub KillDuplicateParas()
Dim i As Long, j As Long
Dim eTime As Single
eTime = Timer
With ActiveDocument
For i = .Paragraphs.Count - 1 To 1 Step -1
For j = .Paragraphs.Count To i + 1 Step -1
If Len(.Paragraphs(i).Range) = Len(.Paragraphs(j).Range) Then
If .Paragraphs(i).Range = .Paragraphs(j).Range Then
.Paragraphs(j).Range.Delete
End If
End If
Next
Next
End With
MsgBox Timer - eTime

End Sub

800 Paragraphs, 4 unique paragraphs.
paragraph length about 460 characters.

--
Greetings from Bavaria, Germany

Helmut Weber, MVP WordVBA

Win XP, Office 2003
"red.sys" & Chr$(64) & "t-online.de"


Re: KillDuplicateParas by Helmut

Helmut
Sat Mar 10 06:51:46 CST 2007

Hi,

this one takes 1.15 seconds, here and now,
under the above mentioned conditions:

Sub Makro6xx()
Dim prg1 As Paragraph
Dim t As Single
t = Timer
For Each prg1 In ActiveDocument.Range.Paragraphs
If Not prg1.Next Is Nothing Then
If prg1.Range.Text = prg1.Next.Range.Text Then
prg1.Next.Range.Delete
End If
End If
Next
MsgBox Timer - t
End Sub

However, IMHO, like other alternative solutions,
except from comparing each paragraph to each other paragraph,
see above, too, it does not take care of the fact that deleting
one of two immediatly adjacent paragraphs may
result again in two adjacent equal paragraphs.

Happy thinking!

--
Greetings from Bavaria, Germany

Helmut Weber, MVP WordVBA

Win XP, Office 2003
"red.sys" & Chr$(64) & "t-online.de"

Re: KillDuplicateParas by Greg

Greg
Sat Mar 10 08:45:49 CST 2007

Hi macropod.

It is a derivative of your code and there is a point of checking paragraphs
of unequal length because it is faster than first checking all paragraphs
to see if they are of equal length ;-).

< I think they key difference is that my code checks all paras against each
other, whereas yours only checks adjacent paras. I might
<incorporate Helmut's revisions, though, since they seem to speed things up
a bit.

I don't know your test results but the code I posted has nothing to do with
adjacent pararapraphs. You can take those 799 unique paragraphs and add 10,
20 or a 100 duplicates anywhere in the mix, run the code and the duplicates
are removed.

All that said, there appears to be more to this that I don't understand. At
work yesterday with the The quick brown fox example Helmut gave, my code
was 69 seconds compared to Helmut's 123 seconds. Today at home with the
much longer Word2007 =(rand) text the processing 800 paragraphs takes my
method 272 seconds and Helmut's 236.

Perhaps there is no best way ;-)




--
Greg Maxey/Word MVP
See:
http://gregmaxey.mvps.org/word_tips.htm
For some helpful tips using Word.

"macropod" <invalid@invalid.invalid> wrote in message
news:OMP6W3vYHHA.4520@TK2MSFTNGP06.phx.gbl...
> Hi Greg,
>
> Looks like a derivative of something I developed & posted. My full version
> is:
>
> Dim SBar As Boolean ' Status Bar flag
> Dim TrkStatus As Boolean ' Track Changes flag
>
> Sub KillDuplicateParas()
> Call MacroEntry
> Dim i As Long, j As Long
> Dim eTime As Single
> eTime = Timer
> With ActiveDocument
> If .Paragraphs.Count > 1 Then
> ' Loop backwards to preserve paragraph count & indexing.
> ' Start at 2nd-last paragraph.
> For i = .Paragraphs.Count - 1 To 1 Step -1
> ' Ignore empty paragraphs
> If Len(.Paragraphs(i).Range.Text) > 1 Then
> ' Loop backwards to preserve paragraph count & indexing.
> ' Stop at last preceding paragraph.
> For j = .Paragraphs.Count To i + 1 Step -1
> ' Report progress on Status Bar.
> Application.StatusBar = i & " paragraphs to check. "
> ' No point in checking paragraphs of unequal length.
> If Len(.Paragraphs(i).Range) =
> Len(.Paragraphs(j).Range) Then
> ' Test strings of paragraphs of equal length.
> If .Paragraphs(i).Range = .Paragraphs(j).Range Then
> ' Delete duplicate paragraph.
> .Paragraphs(j).Range.Delete
> End If
> End If
> Next
> End If
> Next
> End If
> End With
> ' Report time taken. Elapsed time calculation allows for execution to
> extend past midnight.
> MsgBox "Finished. Elapsed time: " & (Timer - eTime + 86400) Mod 86400 & "
> seconds."
> Call MacroExit
> End Sub
>
> Private Sub MacroEntry()
> ' Store current Status Bar status, then switch on
> SBar = Application.DisplayStatusBar
> Application.DisplayStatusBar = True
> ' Store current Track Changes status, then switch off
> With ActiveDocument
> TrkStatus = .TrackRevisions
> .TrackRevisions = False
> End With
> ' Turn Off Screen Updating
> Application.ScreenUpdating = False
> End Sub
>
> Private Sub MacroExit()
> ' Clear the Status Bar
> Application.StatusBar = False
> ' Restore original Status Bar status
> Application.DisplayStatusBar = SBar
> ' Restore original Track Changes status
> ActiveDocument.TrackRevisions = TrkStatus
> ' Restore Screen Updating
> Application.ScreenUpdating = True
> End Sub
>
> I think they key difference is that my code checks all paras against each
> other, whereas yours only checks adjacent paras. I might incorporate
> Helmut's revisions, though, since they seem to speed things up a bit.
>
> Cheers
>
> --
> macropod
> [MVP - Microsoft Word]
> -------------------------
>
> "Greg Maxey" <gmaxey@gmail.com> wrote in message
> news:1173463454.081324.32510@s48g2000cws.googlegroups.com...
>>I came across an old post in Google groups for deleted duplicated
>> lines of text in a document.
>>
>> It used a For x = Count method to go through and check the range of
>> one paragraph to the the range of every other paragraph and delete any
>> duplicates.
>>
>> It had two If ... End If blocks. The first check the para range
>> length. If = then the second performed a text comparison. I assume
>> the author thought that it would save time by doing a text comparison
>> only on paras of equal length.
>>
>> The procedure worked as advertised, however with a longer document it
>> took a long time.
>>
>> I created about 800 paragraphs and determined that it was actually
>> much quicker to bypass the the first length check and just do a range
>> comparison on every paragraph. Down from 200 seconds to 75 seconds!
>>
>> Next I remembered an method that Jezebel showed me for stepping
>> through items using the .Next (property or method I am never sure
>> which).
>> I adapted the code as follows and the time taken was down to 3
>> seconds!
>>
>> Anyway, I just wanted to share this with the group:
>>
>> Sub KillDuplicateParagraphs()
>> Dim SBar As Boolean
>> Dim TrkStatus As Boolean
>> Dim eTime As Single
>> Dim oParRef As Paragraph
>> Dim oParChk As Paragraph
>> eTime = Timer
>> With ActiveDocument
>> TrkStatus = .TrackRevisions
>> .TrackRevisions = False
>> End With
>> With Application
>> SBar = .DisplayStatusBar
>> .DisplayStatusBar = True
>> .ScreenUpdating = False
>> End With
>> Set oParRef = ActiveDocument.Range.Paragraphs(1)
>> Set oParChk = oParRef.Next
>> Do
>> '*** Stet out first if block to delete duplicated empty paragraphs.
>> If Len(oParRef.Range.Text) > 1 Then
>> Do
>> 'An empty last paragraph may throw an error on the last loop.
>> On Error GoTo Err_Exit
>> If oParRef.Range = oParChk.Range Then
>> oParChk.Range.Delete
>> Else
>> Set oParChk = oParChk.Next
>> End If
>> Loop Until oParChk Is Nothing
>> End If '***
>> Set oParRef = oParRef.Next
>> 'Skip errors.
>> On Error Resume Next
>> Set oParChk = oParRef.Next
>> On Error GoTo 0
>> Loop Until oParRef Is Nothing
>> Err_Exit:
>> With Application
>> .StatusBar = False
>> .DisplayStatusBar = SBar
>> .ScreenUpdating = True
>> End With
>> ActiveDocument.TrackRevisions = TrkStatus
>> MsgBox "Finished. Elapsed time: " & (Timer - eTime + 86400) Mod 86400
>> & " seconds."
>> End Sub
>>
>



Re: KillDuplicateParas by Klaus

Klaus
Sun Mar 11 11:55:21 CDT 2007

I do that with a wildcard replacement...

Find what: ([!^13]@^13){2,}
Replace with: \1

It's not perfect -- Say

abcd
cd
cd

would be replaced with

abcd

But usually, I just risk that. And it's fast <g>

Klaus


"Greg Maxey" <gmaxey@gmail.com> schrieb im Newsbeitrag
news:1173463454.081324.32510@s48g2000cws.googlegroups.com...
>I came across an old post in Google groups for deleted duplicated
> lines of text in a document.
>
> It used a For x = Count method to go through and check the range of
> one paragraph to the the range of every other paragraph and delete any
> duplicates.
>
> It had two If ... End If blocks. The first check the para range
> length. If = then the second performed a text comparison. I assume
> the author thought that it would save time by doing a text comparison
> only on paras of equal length.
>
> The procedure worked as advertised, however with a longer document it
> took a long time.
>
> I created about 800 paragraphs and determined that it was actually
> much quicker to bypass the the first length check and just do a range
> comparison on every paragraph. Down from 200 seconds to 75 seconds!
>
> Next I remembered an method that Jezebel showed me for stepping
> through items using the .Next (property or method I am never sure
> which).
> I adapted the code as follows and the time taken was down to 3
> seconds!
>
> Anyway, I just wanted to share this with the group:
>
> Sub KillDuplicateParagraphs()
> Dim SBar As Boolean
> Dim TrkStatus As Boolean
> Dim eTime As Single
> Dim oParRef As Paragraph
> Dim oParChk As Paragraph
> eTime = Timer
> With ActiveDocument
> TrkStatus = .TrackRevisions
> .TrackRevisions = False
> End With
> With Application
> SBar = .DisplayStatusBar
> .DisplayStatusBar = True
> .ScreenUpdating = False
> End With
> Set oParRef = ActiveDocument.Range.Paragraphs(1)
> Set oParChk = oParRef.Next
> Do
> '*** Stet out first if block to delete duplicated empty paragraphs.
> If Len(oParRef.Range.Text) > 1 Then
> Do
> 'An empty last paragraph may throw an error on the last loop.
> On Error GoTo Err_Exit
> If oParRef.Range = oParChk.Range Then
> oParChk.Range.Delete
> Else
> Set oParChk = oParChk.Next
> End If
> Loop Until oParChk Is Nothing
> End If '***
> Set oParRef = oParRef.Next
> 'Skip errors.
> On Error Resume Next
> Set oParChk = oParRef.Next
> On Error GoTo 0
> Loop Until oParRef Is Nothing
> Err_Exit:
> With Application
> .StatusBar = False
> .DisplayStatusBar = SBar
> .ScreenUpdating = True
> End With
> ActiveDocument.TrackRevisions = TrkStatus
> MsgBox "Finished. Elapsed time: " & (Timer - eTime + 86400) Mod 86400
> & " seconds."
> End Sub
>



Re: KillDuplicateParas by Greg

Greg
Mon Mar 12 05:23:16 CDT 2007

Klaus,

No argument that that is the fastest method for eliminating "adjacent"
ducplicate paragraphs. However run that code with:

The quick brown fox jumped over the lazy dog.
The quick brown fox jumped over the lazy dog.
The quick brown fox jumped over the lazy dog.
The quick and extremely agile brown fox jumped over 2 the lazy dogs.
The quick and extremely agile brown fox jumped over 3 the lazy dogs.
The quick and extremely agile brown fox jumped over 4 the lazy dogs.
The quick and extremely agile brown fox jumped over 5 the lazy dogs.
The quick brown fox jumped over the lazy dog.

You are left with:
The quick brown fox jumped over the lazy dog.
The quick and extremely agile brown fox jumped over 2 lazy dogs.
The quick and extremely agile brown fox jumped over 3 lazy dogs.
The quick and extremely agile brown fox jumped over 4 lazy dogs.
The quick and extremely agile brown fox jumped over 5 lazy dogs.
The quick brown fox jumped over the lazy dog.

Where my desired result is:
The quick brown fox jumped over the lazy dog.
The quick and extremely agile brown fox jumped over 2 lazy dogs.
The quick and extremely agile brown fox jumped over 3 lazy dogs.
The quick and extremely agile brown fox jumped over 4 lazy dogs.
The quick and extremely agile brown fox jumped over 5 lazy dogs.








On Mar 11, 12:55 pm, "Klaus Linke" <i...@fotosatz-kaufmann.de> wrote:
> I do that with a wildcard replacement...
>
> Find what: ([!^13]@^13){2,}
> Replace with: \1
>
> It's not perfect -- Say
>
> abcd
> cd
> cd
>
> would be replaced with
>
> abcd
>
> But usually, I just risk that. And it's fast <g>
>
> Klaus
>
> "Greg Maxey" <gma...@gmail.com> schrieb im Newsbeitragnews:1173463454.081324.32510@s48g2000cws.googlegroups.com...
>
>
>
> >I came across an old post in Google groups for deleted duplicated
> > lines of text in a document.
>
> > It used a For x = Count method to go through and check the range of
> > one paragraph to the the range of every other paragraph and delete any
> > duplicates.
>
> > It had two If ... End If blocks. The first check the para range
> > length. If = then the second performed a text comparison. I assume
> > the author thought that it would save time by doing a text comparison
> > only on paras of equal length.
>
> > The procedure worked as advertised, however with a longer document it
> > took a long time.
>
> > I created about 800 paragraphs and determined that it was actually
> > much quicker to bypass the the first length check and just do a range
> > comparison on every paragraph. Down from 200 seconds to 75 seconds!
>
> > Next I remembered an method that Jezebel showed me for stepping
> > through items using the .Next (property or method I am never sure
> > which).
> > I adapted the code as follows and the time taken was down to 3
> > seconds!
>
> > Anyway, I just wanted to share this with the group:
>
> > Sub KillDuplicateParagraphs()
> > Dim SBar As Boolean
> > Dim TrkStatus As Boolean
> > Dim eTime As Single
> > Dim oParRef As Paragraph
> > Dim oParChk As Paragraph
> > eTime = Timer
> > With ActiveDocument
> > TrkStatus = .TrackRevisions
> > .TrackRevisions = False
> > End With
> > With Application
> > SBar = .DisplayStatusBar
> > .DisplayStatusBar = True
> > .ScreenUpdating = False
> > End With
> > Set oParRef = ActiveDocument.Range.Paragraphs(1)
> > Set oParChk = oParRef.Next
> > Do
> > '*** Stet out first if block to delete duplicated empty paragraphs.
> > If Len(oParRef.Range.Text) > 1 Then
> > Do
> > 'An empty last paragraph may throw an error on the last loop.
> > On Error GoTo Err_Exit
> > If oParRef.Range = oParChk.Range Then
> > oParChk.Range.Delete
> > Else
> > Set oParChk = oParChk.Next
> > End If
> > Loop Until oParChk Is Nothing
> > End If '***
> > Set oParRef = oParRef.Next
> > 'Skip errors.
> > On Error Resume Next
> > Set oParChk = oParRef.Next
> > On Error GoTo 0
> > Loop Until oParRef Is Nothing
> > Err_Exit:
> > With Application
> > .StatusBar = False
> > .DisplayStatusBar = SBar
> > .ScreenUpdating = True
> > End With
> > ActiveDocument.TrackRevisions = TrkStatus
> > MsgBox "Finished. Elapsed time: " & (Timer - eTime + 86400) Mod 86400
> > & " seconds."
> > End Sub- Hide quoted text -
>
> - Show quoted text -



Re: KillDuplicateParas by Klaus

Klaus
Mon Mar 12 13:45:05 CDT 2007

Hi Greg,

Hadn't noticed that. What you could do is autonumber the paragraphs, turn
the numbering into hard text.
Selection.Range.ListFormat.ConvertNumbersToText

Now each paragraph has a number followed by a tab followed by the old text.

Then sort by fields (with the tab as separator, sorting by the second
field).
Then remove the duplicates with a wildcard search:
Find what: ([0-9]@^t)([!^13]@^13)([0-9]@^t)\2
Replace with: \1\2
(Repeat until nothing more is found)

Sort by the first field (numbers) to get back the old sequence.

Then remove the numbers and tabs with a wildcard search,
Find what: (^13)[0-9]@^t
Replace with: \1

Regards,
Klaus



"Greg Maxey" <gmaxey@gmail.com> wrote:
> Klaus,
>
> No argument that that is the fastest method for eliminating "adjacent"
> ducplicate paragraphs. However run that code with:
>
> The quick brown fox jumped over the lazy dog.
> The quick brown fox jumped over the lazy dog.
> The quick brown fox jumped over the lazy dog.
> The quick and extremely agile brown fox jumped over 2 the lazy dogs.
> The quick and extremely agile brown fox jumped over 3 the lazy dogs.
> The quick and extremely agile brown fox jumped over 4 the lazy dogs.
> The quick and extremely agile brown fox jumped over 5 the lazy dogs.
> The quick brown fox jumped over the lazy dog.
>
> You are left with:
> The quick brown fox jumped over the lazy dog.
> The quick and extremely agile brown fox jumped over 2 lazy dogs.
> The quick and extremely agile brown fox jumped over 3 lazy dogs.
> The quick and extremely agile brown fox jumped over 4 lazy dogs.
> The quick and extremely agile brown fox jumped over 5 lazy dogs.
> The quick brown fox jumped over the lazy dog.
>
> Where my desired result is:
> The quick brown fox jumped over the lazy dog.
> The quick and extremely agile brown fox jumped over 2 lazy dogs.
> The quick and extremely agile brown fox jumped over 3 lazy dogs.
> The quick and extremely agile brown fox jumped over 4 lazy dogs.
> The quick and extremely agile brown fox jumped over 5 lazy dogs.
>
>
>
>
>
>
>
>
> On Mar 11, 12:55 pm, "Klaus Linke" <i...@fotosatz-kaufmann.de> wrote:
>> I do that with a wildcard replacement...
>>
>> Find what: ([!^13]@^13){2,}
>> Replace with: \1
>>
>> It's not perfect -- Say
>>
>> abcd
>> cd
>> cd
>>
>> would be replaced with
>>
>> abcd
>>
>> But usually, I just risk that. And it's fast <g>
>>
>> Klaus
>>
>> "Greg Maxey" <gma...@gmail.com> schrieb im
>> Newsbeitragnews:1173463454.081324.32510@s48g2000cws.googlegroups.com...
>>
>>
>>
>> >I came across an old post in Google groups for deleted duplicated
>> > lines of text in a document.
>>
>> > It used a For x = Count method to go through and check the range of
>> > one paragraph to the the range of every other paragraph and delete any
>> > duplicates.
>>
>> > It had two If ... End If blocks. The first check the para range
>> > length. If = then the second performed a text comparison. I assume
>> > the author thought that it would save time by doing a text comparison
>> > only on paras of equal length.
>>
>> > The procedure worked as advertised, however with a longer document it
>> > took a long time.
>>
>> > I created about 800 paragraphs and determined that it was actually
>> > much quicker to bypass the the first length check and just do a range
>> > comparison on every paragraph. Down from 200 seconds to 75 seconds!
>>
>> > Next I remembered an method that Jezebel showed me for stepping
>> > through items using the .Next (property or method I am never sure
>> > which).
>> > I adapted the code as follows and the time taken was down to 3
>> > seconds!
>>
>> > Anyway, I just wanted to share this with the group:
>>
>> > Sub KillDuplicateParagraphs()
>> > Dim SBar As Boolean
>> > Dim TrkStatus As Boolean
>> > Dim eTime As Single
>> > Dim oParRef As Paragraph
>> > Dim oParChk As Paragraph
>> > eTime = Timer
>> > With ActiveDocument
>> > TrkStatus = .TrackRevisions
>> > .TrackRevisions = False
>> > End With
>> > With Application
>> > SBar = .DisplayStatusBar
>> > .DisplayStatusBar = True
>> > .ScreenUpdating = False
>> > End With
>> > Set oParRef = ActiveDocument.Range.Paragraphs(1)
>> > Set oParChk = oParRef.Next
>> > Do
>> > '*** Stet out first if block to delete duplicated empty paragraphs.
>> > If Len(oParRef.Range.Text) > 1 Then
>> > Do
>> > 'An empty last paragraph may throw an error on the last loop.
>> > On Error GoTo Err_Exit
>> > If oParRef.Range = oParChk.Range Then
>> > oParChk.Range.Delete
>> > Else
>> > Set oParChk = oParChk.Next
>> > End If
>> > Loop Until oParChk Is Nothing
>> > End If '***
>> > Set oParRef = oParRef.Next
>> > 'Skip errors.
>> > On Error Resume Next
>> > Set oParChk = oParRef.Next
>> > On Error GoTo 0
>> > Loop Until oParRef Is Nothing
>> > Err_Exit:
>> > With Application
>> > .StatusBar = False
>> > .DisplayStatusBar = SBar
>> > .ScreenUpdating = True
>> > End With
>> > ActiveDocument.TrackRevisions = TrkStatus
>> > MsgBox "Finished. Elapsed time: " & (Timer - eTime + 86400) Mod 86400
>> > & " seconds."
>> > End Sub- Hide quoted text -
>>
>> - Show quoted text -
>
>



Re: KillDuplicateParas by Greg

Greg
Mon Mar 12 15:00:59 CDT 2007

Klaus,

The method looks plausble, but I couldn't work out the code :-(

I used

oRng.ListFormat.DefualtNumberFormat

to apply the numbering. This adss a "period" between the number and
your the tab.

What did you have in mind when you said "autonumber?"



On Mar 12, 2:45 pm, "Klaus Linke" <i...@fotosatz-kaufmann.de> wrote:
> Hi Greg,
>
> Hadn't noticed that. What you could do is autonumber the paragraphs, turn
> the numbering into hard text.
> Selection.Range.ListFormat.ConvertNumbersToText
>
> Now each paragraph has a number followed by a tab followed by the old text.
>
> Then sort by fields (with the tab as separator, sorting by the second
> field).
> Then remove the duplicates with a wildcard search:
> Find what: ([0-9]@^t)([!^13]@^13)([0-9]@^t)\2
> Replace with: \1\2
> (Repeat until nothing more is found)
>
> Sort by the first field (numbers) to get back the old sequence.
>
> Then remove the numbers and tabs with a wildcard search,
> Find what: (^13)[0-9]@^t
> Replace with: \1
>
> Regards,
> Klaus
>
>
>
> "Greg Maxey" <gma...@gmail.com> wrote:
> > Klaus,
>
> > No argument that that is the fastest method for eliminating "adjacent"
> > ducplicate paragraphs. However run that code with:
>
> > The quick brown fox jumped over the lazy dog.
> > The quick brown fox jumped over the lazy dog.
> > The quick brown fox jumped over the lazy dog.
> > The quick and extremely agile brown fox jumped over 2 the lazy dogs.
> > The quick and extremely agile brown fox jumped over 3 the lazy dogs.
> > The quick and extremely agile brown fox jumped over 4 the lazy dogs.
> > The quick and extremely agile brown fox jumped over 5 the lazy dogs.
> > The quick brown fox jumped over the lazy dog.
>
> > You are left with:
> > The quick brown fox jumped over the lazy dog.
> > The quick and extremely agile brown fox jumped over 2 lazy dogs.
> > The quick and extremely agile brown fox jumped over 3 lazy dogs.
> > The quick and extremely agile brown fox jumped over 4 lazy dogs.
> > The quick and extremely agile brown fox jumped over 5 lazy dogs.
> > The quick brown fox jumped over the lazy dog.
>
> > Where my desired result is:
> > The quick brown fox jumped over the lazy dog.
> > The quick and extremely agile brown fox jumped over 2 lazy dogs.
> > The quick and extremely agile brown fox jumped over 3 lazy dogs.
> > The quick and extremely agile brown fox jumped over 4 lazy dogs.
> > The quick and extremely agile brown fox jumped over 5 lazy dogs.
>
> > On Mar 11, 12:55 pm, "Klaus Linke" <i...@fotosatz-kaufmann.de> wrote:
> >> I do that with a wildcard replacement...
>
> >> Find what: ([!^13]@^13){2,}
> >> Replace with: \1
>
> >> It's not perfect -- Say
>
> >> abcd
> >> cd
> >> cd
>
> >> would be replaced with
>
> >> abcd
>
> >> But usually, I just risk that. And it's fast <g>
>
> >> Klaus
>
> >> "Greg Maxey" <gma...@gmail.com> schrieb im
> >> Newsbeitragnews:1173463454.081324.32510@s48g2000cws.googlegroups.com...
>
> >> >I came across an old post in Google groups for deleted duplicated
> >> > lines of text in a document.
>
> >> > It used a For x = Count method to go through and check the range of
> >> > one paragraph to the the range of every other paragraph and delete any
> >> > duplicates.
>
> >> > It had two If ... End If blocks. The first check the para range
> >> > length. If = then the second performed a text comparison. I assume
> >> > the author thought that it would save time by doing a text comparison
> >> > only on paras of equal length.
>
> >> > The procedure worked as advertised, however with a longer document it
> >> > took a long time.
>
> >> > I created about 800 paragraphs and determined that it was actually
> >> > much quicker to bypass the the first length check and just do a range
> >> > comparison on every paragraph. Down from 200 seconds to 75 seconds!
>
> >> > Next I remembered an method that Jezebel showed me for stepping
> >> > through items using the .Next (property or method I am never sure
> >> > which).
> >> > I adapted the code as follows and the time taken was down to 3
> >> > seconds!
>
> >> > Anyway, I just wanted to share this with the group:
>
> >> > Sub KillDuplicateParagraphs()
> >> > Dim SBar As Boolean
> >> > Dim TrkStatus As Boolean
> >> > Dim eTime As Single
> >> > Dim oParRef As Paragraph
> >> > Dim oParChk As Paragraph
> >> > eTime = Timer
> >> > With ActiveDocument
> >> > TrkStatus = .TrackRevisions
> >> > .TrackRevisions = False
> >> > End With
> >> > With Application
> >> > SBar = .DisplayStatusBar
> >> > .DisplayStatusBar = True
> >> > .ScreenUpdating = False
> >> > End With
> >> > Set oParRef = ActiveDocument.Range.Paragraphs(1)
> >> > Set oParChk = oParRef.Next
> >> > Do
> >> > '*** Stet out first if block to delete duplicated empty paragraphs.
> >> > If Len(oParRef.Range.Text) > 1 Then
> >> > Do
> >> > 'An empty last paragraph may throw an error on the last loop.
> >> > On Error GoTo Err_Exit
> >> > If oParRef.Range = oParChk.Range Then
> >> > oParChk.Range.Delete
> >> > Else
> >> > Set oParChk = oParChk.Next
> >> > End If
> >> > Loop Until oParChk Is Nothing
> >> > End If '***
> >> > Set oParRef = oParRef.Next
> >> > 'Skip errors.
> >> > On Error Resume Next
> >> > Set oParChk = oParRef.Next
> >> > On Error GoTo 0
> >> > Loop Until oParRef Is Nothing
> >> > Err_Exit:
> >> > With Application
> >> > .StatusBar = False
> >> > .DisplayStatusBar = SBar
> >> > .ScreenUpdating = True
> >> > End With
> >> > ActiveDocument.TrackRevisions = TrkStatus
> >> > MsgBox "Finished. Elapsed time: " & (Timer - eTime + 86400) Mod 86400
> >> > & " seconds."
> >> > End Sub- Hide quoted text -
>
> >> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -



Re: KillDuplicateParas by Klaus

Klaus
Mon Mar 12 15:08:39 CDT 2007

Never mind :-(

It's actually slower than code (Makro6x) that's already been posted.

If there are thousands of paragraphs, it might pay off to get more
sophisticated.

Say, to read the whole document into a string, operate on that, and then
delete the duplicate paragraphs found.
If there aren't many duplicate paragraphs, that should be faster:

Sub Makro6n()
Dim t As Single
t = Timer
Dim prg1 As Paragraph
Dim vText As Variant
vText = ActiveDocument.Content.Text
vText = Split(vText, vbCr)
Dim i As Long, j As Long
For i = LBound(vText) To UBound(vText)
For j = LBound(vText) To UBound(vText)
If vText(i) = vText(j) And i <> j Then
vText(j) = "<delete>" & STR(j)
End If
Next j
Next i
MsgBox Timer - t
For i = UBound(vText) To LBound(vText) Step -1
If vText(i) = "<delete>" & STR(i) Then
ActiveDocument.Paragraphs(i + 1).Range.Delete
End If
Next i
End Sub


Or to avoid the double loop (comparing each paragraph with every other,
which takes an amount of time proportional to the square of the number of
paragraphs), it might pay off to use an efficient sorting algorithm (which
takes a time proportional to N log N, or even to N), then remove doubles
(proportional to N), then sort back.

Klaus



Re: KillDuplicateParas by Klaus

Klaus
Mon Mar 12 15:15:54 CDT 2007

> What did you have in mind when you said "autonumber?"

Something like this:
Dim myLT As ListTemplate
Set myLT = ActiveDocument.ListTemplates.Add(OutlineNumbered:=False,
Name:="Test")
With myLT.ListLevels(1)
.NumberFormat = "%1"
.TrailingCharacter = wdTrailingTab
.NumberStyle = wdListNumberStyleArabic
.StartAt = 1
.LinkedStyle = ""
End With
Selection.WholeStory
Selection.Range.ListFormat.ApplyListTemplate ListTemplate:=myLT

But it isn't as fast as I thought... see last reply.

Klaus



Re: KillDuplicateParas by Klaus

Klaus
Mon Mar 12 15:47:31 CDT 2007

A bit better optimized... (Can't believe I looped all paragraphs in the
inner loop <g>)

Klaus

Sub Makro6n2()
Dim t As Single, t1 As Single
t = Timer
Dim prg1 As Paragraph
Dim vText As Variant
Dim delFrom As Long, delTo As Long
vText = ActiveDocument.Content.Text
vText = Split(vText, vbCr)
Dim i As Long, j As Long
For i = LBound(vText) To UBound(vText)
For j = i + 1 To UBound(vText)
If vText(i) = vText(j) Then
vText(j) = STR(j)
End If
Next j
Next i
t1 = Timer - t
delFrom = 0
For i = UBound(vText) To LBound(vText) Step -1
If vText(i) = STR(i) Then
delFrom = i + 1
If delTo = 0 Then
delTo = i + 1
End If
Else
If delTo <> 0 Then
ActiveDocument.Range( _
ActiveDocument.Paragraphs(delFrom).Range.Start - 1, _
ActiveDocument.Paragraphs(delTo).Range.End - 1 _
).Delete
End If
delFrom = 0
delTo = 0
End If
Next i
MsgBox Timer - t, , t1
End Sub



Re: KillDuplicateParas by Greg

Greg
Mon Mar 12 17:15:53 CDT 2007

Klaus,

Looks like you hit the home run. Your method is fast as lightening (less
than a second) for all tests using 800 paragraphs. All the same leaving 1,
All difference nothing deleted, and several tests with a mixed bag.

Excellent work!

Helmut take notice. A master is in our midst ;-)



--
Greg Maxey/Word MVP
See:
http://gregmaxey.mvps.org/word_tips.htm
For some helpful tips using Word.


Klaus Linke wrote:
> A bit better optimized... (Can't believe I looped all paragraphs in
> the inner loop <g>)
>
> Klaus
>
> Sub Makro6n2()
> Dim t As Single, t1 As Single
> t = Timer
> Dim prg1 As Paragraph
> Dim vText As Variant
> Dim delFrom As Long, delTo As Long
> vText = ActiveDocument.Content.Text
> vText = Split(vText, vbCr)
> Dim i As Long, j As Long
> For i = LBound(vText) To UBound(vText)
> For j = i + 1 To UBound(vText)
> If vText(i) = vText(j) Then
> vText(j) = STR(j)
> End If
> Next j
> Next i
> t1 = Timer - t
> delFrom = 0
> For i = UBound(vText) To LBound(vText) Step -1
> If vText(i) = STR(i) Then
> delFrom = i + 1
> If delTo = 0 Then
> delTo = i + 1
> End If
> Else
> If delTo <> 0 Then
> ActiveDocument.Range( _
> ActiveDocument.Paragraphs(delFrom).Range.Start - 1, _
> ActiveDocument.Paragraphs(delTo).Range.End - 1 _
> ).Delete
> End If
> delFrom = 0
> delTo = 0
> End If
> Next i
> MsgBox Timer - t, , t1
> End Sub



Re: KillDuplicateParas by Klaus

Klaus
Mon Mar 12 18:16:33 CDT 2007

Thanks!! An error I built in:

Replace

ActiveDocument.Range( _
ActiveDocument.Paragraphs(delFrom).Range.Start -1, _
ActiveDocument.Paragraphs(delTo).Range.End - 1 _
).Delete

with
ActiveDocument.Range( _
ActiveDocument.Paragraphs(delFrom).Range.Start, _
ActiveDocument.Paragraphs(delTo).Range.End _
).Delete

I thought deleting the last paragraph mark might cause problems (... it
doesn't), and my "work-around" could change the paragraph style.

Klaus



Re: KillDuplicateParas by Helmut

Helmut
Tue Mar 13 07:39:50 CDT 2007

Hi all,

hmm...

Great job, Klaus!

Let me explain to my excuse,
I assumed there might be some not mentioned
bordering questions lurking behind, like:
"Excellent, but now I want to enclude some formatting conditions".

Which is often the case with posters, of course other than Greg.

Regardles of formatting, regardless of Word at all,
it might still be faster not to delete Word-ranges at all,
but to process an array of strings, delimited by chr(13),
representing the doc's content, and writing it back.

Don't let me be misunderstood,
I'm just enjoying a good discussion.

--
Greetings from Bavaria, Germany

Helmut Weber, MVP WordVBA

Win XP, Office 2003
"red.sys" & Chr$(64) & "t-online.de"












Re: KillDuplicateParas by Klaus

Klaus
Tue Mar 13 08:45:11 CDT 2007

> "Excellent, but now I want to enclude some formatting conditions".
>
> Which is often the case with posters, of course other than Greg.

Should be easy: Replace
ActiveDocument.Range( _
ActiveDocument.Paragraphs(delFrom).Range.Start - 1, _
ActiveDocument.Paragraphs(delTo).Range.End - 1 _
).Delete
with
ActiveDocument.Range( _
ActiveDocument.Paragraphs(delFrom).Range.Start - 1, _
ActiveDocument.Paragraphs(delTo).Range.End - 1 _
).Font.Color = wdColorRed

You could also leave the .Delete, and turn on "Track Changes" before you run
the macro, so you can review (accept/reject) the deletions later.

> it might still be faster not to delete Word-ranges at all,
> but to process an array of strings, delimited by chr(13),
> representing the doc's content, and writing it back.

True... but I don't expect that to make a real big difference.
An improvement for very large files (1000, 10.000 or more paragraphs) can be
achieved by avoiding the nested loop, as I i mentioned earlier.
You'd need an efficient sorting algorithm (can be done in a single loop,
that is, in a time proportional to the number of paragraphs), and a way to
remember the old order (say by using a two-dimensional array where every
paragraph keeps its old index). After the paragraphs are sorted
alphabetically, duplicates can be deleted (or marked) in a single loop. Then
you restore the old order based on the index (...can also be done in a
single loop).

Regards,
Klaus




Re: KillDuplicateParas by Greg

Greg
Tue Mar 13 09:37:23 CDT 2007

As you two have moved this discussion to the deep end of the pool I am
going to head for one of the deck chairs and passively observe :-)


On Mar 13, 9:45 am, "Klaus Linke" <i...@fotosatz-kaufmann.de> wrote:
> > "Excellent, but now I want to enclude some formatting conditions".
>
> > Which is often the case with posters, of course other than Greg.
>
> Should be easy: Replace
> ActiveDocument.Range( _
> ActiveDocument.Paragraphs(delFrom).Range.Start - 1, _
> ActiveDocument.Paragraphs(delTo).Range.End - 1 _
> ).Delete
> with
> ActiveDocument.Range( _
> ActiveDocument.Paragraphs(delFrom).Range.Start - 1, _
> ActiveDocument.Paragraphs(delTo).Range.End - 1 _
> ).Font.Color = wdColorRed
>
> You could also leave the .Delete, and turn on "Track Changes" before you run
> the macro, so you can review (accept/reject) the deletions later.
>
> > it might still be faster not to delete Word-ranges at all,
> > but to process an array of strings, delimited by chr(13),
> > representing the doc's content, and writing it back.
>
> True... but I don't expect that to make a real big difference.
> An improvement for very large files (1000, 10.000 or more paragraphs) can be
> achieved by avoiding the nested loop, as I i mentioned earlier.
> You'd need an efficient sorting algorithm (can be done in a single loop,
> that is, in a time proportional to the number of paragraphs), and a way to
> remember the old order (say by using a two-dimensional array where every
> paragraph keeps its old index). After the paragraphs are sorted
> alphabetically, duplicates can be deleted (or marked) in a single loop. Then
> you restore the old order based on the index (...can also be done in a
> single loop).
>
> Regards,
> Klaus



Re: KillDuplicateParas by Klaus

Klaus
Tue Mar 13 10:52:54 CDT 2007

> As you two have moved this discussion to the deep end of the pool I am
> going to head for one of the deck chairs and passively observe :-)

;-)

I'm not sure whether I understood Helmut correctly.

If "include some formatting conditions" means that you'd want to delete
duplicates only if they have the same formatting, then my approach would not
work... unless you tag the formatting in the string array.
Say the paragraph style:
<Heading 1>Some text
<Normal>Some other text
<Heading 2>Some text
...

Klaus