Generating a List of Unique Words

by Allen Wyatt
(last updated October 15, 2016)

3

Isao wonders if there is a way to easily construct a list of all the unique words in a document. He doesn't need to know how many times each word appears; he just needs the list of unique words. In addition, uppercase and lowercase variations on the same word should count as the same word.

There is no built-in Word function or tool to do this. However, in VBA you can access the Words collection, which includes all the words in the document. With this in mind, you can create a macro that builds a sorted list of unique words in the document and then adds those words to the end of the document.

Sub UniqueWordList()
    Dim wList As New Collection
    Dim wrd
    Dim chkwrd
    Dim sTemp As String
    Dim k As Long

    For Each wrd In ActiveDocument.Range.Words
        sTemp = Trim(LCase(wrd))
        If sTemp >= "a" And sTemp <= "z" Then
            k = 0
            For Each chkwrd In wList
                k = k + 1
                If chkwrd = sTemp Then GoTo nw
                If chkwrd > sTemp Then
                    wList.Add Item:=sTemp, Before:=k
                    GoTo nw
                End If
            Next chkwrd
            wList.Add Item:=sTemp
        End If
nw:
    Next wrd

    sTemp = "There are " & ActiveDocument.Range.Words.Count & " words "
    sTemp = sTemp & "in the document, before this summary, but there "
    sTemp = sTemp & "are only " & wList.Count & " unique words."

    ActiveDocument.Range.Select
    Selection.Collapse Direction:=wdCollapseEnd
    Selection.TypeText vbCrLf & sTemp & vbCrLf
    For Each chkwrd In wList
        Selection.TypeText chkwrd & vbCrLf
    Next chkwrd
End Sub

Note that each word in the document is extracted, converted to lowercase, and then added to the wList collection, in sorted order. Words are only added if they are alphabetic (thus, numbers are excluded, as is punctuation), and the macro pays no attention to the case of the words. You should also be aware that the macro only looks at words in the main body of the document. It does not include any words in places such as headers, footers, text boxes, or shapes.

The macro could easily be changed to allow for varying needs. For instance, you could have the macro stick the wordlist into a separate document instead of at the end of the current document. All you would need to do is to insert this line before the exiting line shown second here:

    sTemp = "There are " & ActiveDocument.Range.Words.Count & " words "
    sTemp = sTemp & "in " & ActiveDocument.Name & ", but there "
    sTemp = sTemp & "are only " & wList.Count & " unique words."

    Documents.Add
    ActiveDocument.Range.Select
    Selection.Collapse Direction:=wdCollapseEnd
    Selection.TypeText vbCrLf & sTemp & vbCrLf
    For Each chkwrd In wList
        Selection.TypeText chkwrd & vbCrLf
    Next chkwrd
End Sub

Note that there was only one substantive change in the macro: The addition of the "Documents.Add" method to create the new document for the summary.

For some other ideas on getting words out of a document—including macros that tally word frequency—you may want to refer to this tip: Generating a Count of Word Occurrences.

WordTips is your source for cost-effective Microsoft Word training. (Microsoft Word is the most popular word processing software in the world.) This tip (7697) applies to Microsoft Word 2007, 2010, 2013, and 2016.

Author Bio

Allen Wyatt

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He  is president of Sharon Parq Associates, a computer and publishing services company. ...

MORE FROM ALLEN

Absolutely Getting Rid of Formatting

Need to get rid of the formatting applied to a bunch of text? One of the easiest ways to do this is to use Notepad in ...

Discover More

Meaningless Text

The quick brown fox jumps over the lazy dog, or so the story goes. Here's how to put this type of meaningless text into a ...

Discover More

Controlling Entry Order on Unprotected Cells

When you protect a worksheet, one of the benefits is that you can limit which cells can be used for data entry. How a user ...

Discover More

Create Custom Apps with VBA! Discover how to extend the capabilities of Office 2013 (Word, Excel, PowerPoint, Outlook, and Access) with VBA programming, using it for writing macros, automating Office applications, and creating custom applications. Check out Mastering VBA for Office 2013 today!

More WordTips (ribbon)

Word Count for a Section

Dynamic word counts for your entire document are easy to get when you use using fields. There is no built-in method to get a ...

Discover More

Word Count is Zero

If you use the Word Count tool and are surprised that it returns a count of 0, it could be because of what you selected prior ...

Discover More

Automatic Scrolling

Spend a lot of time scrolling around in your document? You might find one of Word's hidden scrolling commands to be a nice ...

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in WordTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 8Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is three more than 5?

2017-07-17 09:35:48

Andrew

Another method I've used is to copy the text of the document (text only paste) into a scratch file, replace all the whitespace with paragraph marks, sort the file (case insensitive sort), and finally delete all duplicate lines using the method from Graham Mayor's page (http://www.gmayor.com/replace_using_wildcards.htm) by wildcard searching for
(*^13)@
and replacing with
\1
to remove the duplicate words.

Andy.


2017-07-14 15:24:02

Max

Excatly what I was looking for. Great solution! Thank you!


2016-10-15 07:40:30

Ken Endacott

Depending on the vocabulary of the author, large documents can have over 5000 unique words. The above macro displays one word per line which means that the list will occupy around 100 pages. The macro below will write the list of words into a new document with five columns per page.

Sub UniqueWordList()
Dim wList As New Collection
Dim wrd
Dim chkwrd
Dim sTemp As String
Dim k As Long

For Each wrd In ActiveDocument.Range.words
sTemp = Trim(LCase(wrd))
If sTemp >= "a" And sTemp <= "z" Then
k = 0
For Each chkwrd In wList
k = k + 1
If chkwrd = sTemp Then GoTo nw
If chkwrd > sTemp Then
wList.Add Item:=sTemp, Before:=k
GoTo nw
End If
Next chkwrd
wList.Add Item:=sTemp
End If
nw:
Next wrd

sTemp = "There are " & ActiveDocument.Range.words.Count & " words " & _
"in " & ActiveDocument.Name & ", but there "
Documents.Add
With ActiveDocument.Styles("Normal")
With .ParagraphFormat
.SpaceBefore = 0
.SpaceAfter = 0
.LineSpacingRule = wdLineSpaceSingle
End With
.NextParagraphStyle = "Normal"
.NoSpaceBetweenParagraphsOfSameStyle = False
.Font.Name = "Times New Roman"
.Font.Size = 10
End With

sTemp = sTemp & "are only " & wList.Count & " unique words."
Selection.TypeText vbCrLf & sTemp & vbCrLf & vbCrLf

Selection.InsertBreak Type:=wdSectionBreakContinuous
With Selection.PageSetup.TextColumns
.SetCount NumColumns:=5
.EvenlySpaced = True
.LineBetween = True
.Width = CentimetersToPoints(2.8)
.Spacing = CentimetersToPoints(0.4)
End With

For Each chkwrd In wList
Selection.TypeText chkwrd & vbCrLf
Next chkwrd
End Sub

If you want to know how often a word appears then the macro UniqueWordCount will list unique words and give a count of how many times each word appears in the document.

Sub UniqueWordCount()
Dim wList As New Collection
Dim wCount As New Collection
Dim wrd
Dim chkwrd
Dim sTemp As String
Dim k As Long
Dim j As Long

For Each wrd In ActiveDocument.Range.words
sTemp = Trim(LCase(wrd))
If sTemp >= "a" And sTemp <= "z" Then
k = 0
For Each chkwrd In wList
k = k + 1
If chkwrd = sTemp Then
j = wCount(k) + 1
wCount.Remove (k)
If k > wCount.Count Or wCount.Count = 1 Then
wCount.Add Item:=j
Else
wCount.Add Item:=j, Before:=k
End If
GoTo nw
End If
If chkwrd > sTemp Then
wList.Add Item:=sTemp, Before:=k
wCount.Add Item:=1, Before:=k
GoTo nw
End If
Next chkwrd
wList.Add Item:=sTemp
wCount.Add Item:=1
End If
nw:
Next wrd

sTemp = "There are " & ActiveDocument.Range.words.Count & " words " & _
"in " & ActiveDocument.Name & ", but there "
Documents.Add
With ActiveDocument.Styles("Normal")
With .ParagraphFormat
.SpaceBefore = 0
.SpaceAfter = 0
.LineSpacingRule = wdLineSpaceSingle
End With
.NextParagraphStyle = "Normal"
.NoSpaceBetweenParagraphsOfSameStyle = False
.Font.Name = "Times New Roman"
.Font.Size = 10
End With

sTemp = sTemp & "are only " & wList.Count & " unique words."
Selection.TypeText vbCrLf & sTemp & vbCrLf & vbCrLf

Selection.InsertBreak Type:=wdSectionBreakContinuous
With Selection.PageSetup.TextColumns
.SetCount NumColumns:=5
.EvenlySpaced = True
.LineBetween = True
.Width = CentimetersToPoints(2.8)
.Spacing = CentimetersToPoints(0.4)
End With

For k = 1 To wList.Count
Selection.TypeText wList(k) & "(" & wCount(k) & ")" & vbCrLf
Next k
End Sub


This Site

Got a version of Word that uses the ribbon interface (Word 2007 or later)? This site is for you! If you use an earlier version of Word, visit our WordTips site focusing on the menu interface.

Newest Tips
Subscribe

FREE SERVICE: Get tips like this every week in WordTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.