Generating a List of Unique Words

by Allen Wyatt
(last updated June 26, 2018)

6

Isao wonders if there is a way to easily construct a list of all the unique words in a document. He doesn't need to know how many times each word appears; he just needs the list of unique words. In addition, uppercase and lowercase variations on the same word should count as the same word.

There is no built-in Word function or tool to do this. However, in VBA you can access the Words collection, which includes all the words in the document. With this in mind, you can create a macro that builds a sorted list of unique words in the document and then adds those words to the end of the document.

Sub UniqueWordList()
    Dim wList As New Collection
    Dim wrd
    Dim chkwrd
    Dim sTemp As String
    Dim k As Long

    For Each wrd In ActiveDocument.Range.Words
        sTemp = Trim(LCase(wrd))
        If sTemp >= "a" And sTemp <= "z" Then
            k = 0
            For Each chkwrd In wList
                k = k + 1
                If chkwrd = sTemp Then GoTo nw
                If chkwrd > sTemp Then
                    wList.Add Item:=sTemp, Before:=k
                    GoTo nw
                End If
            Next chkwrd
            wList.Add Item:=sTemp
        End If
nw:
    Next wrd

    sTemp = "There are " & ActiveDocument.Range.Words.Count & " words "
    sTemp = sTemp & "in the document, before this summary, but there "
    sTemp = sTemp & "are only " & wList.Count & " unique words."

    ActiveDocument.Range.Select
    Selection.Collapse Direction:=wdCollapseEnd
    Selection.TypeText vbCrLf & sTemp & vbCrLf
    For Each chkwrd In wList
        Selection.TypeText chkwrd & vbCrLf
    Next chkwrd
End Sub

Note that each word in the document is extracted, converted to lowercase, and then added to the wList collection, in sorted order. Words are only added if they are alphabetic (thus, numbers are excluded, as is punctuation), and the macro pays no attention to the case of the words. You should also be aware that the macro only looks at words in the main body of the document. It does not include any words in places such as headers, footers, text boxes, or shapes.

The macro could easily be changed to allow for varying needs. For instance, you could have the macro stick the wordlist into a separate document instead of at the end of the current document. All you would need to do is to insert this line before the exiting line shown second here:

    sTemp = "There are " & ActiveDocument.Range.Words.Count & " words "
    sTemp = sTemp & "in " & ActiveDocument.Name & ", but there "
    sTemp = sTemp & "are only " & wList.Count & " unique words."

    Documents.Add
    ActiveDocument.Range.Select
    Selection.Collapse Direction:=wdCollapseEnd
    Selection.TypeText vbCrLf & sTemp & vbCrLf
    For Each chkwrd In wList
        Selection.TypeText chkwrd & vbCrLf
    Next chkwrd
End Sub

Note that there was only one substantive change in the macro: The addition of the "Documents.Add" method to create the new document for the summary.

For some other ideas on getting words out of a document—including macros that tally word frequency—you may want to refer to this tip: Generating a Count of Word Occurrences.

Note:

If you would like to know how to use the macros described on this page (or on any other page on the WordTips sites), I've prepared a special page that includes helpful information. Click here to open that special page in a new browser tab.

WordTips is your source for cost-effective Microsoft Word training. (Microsoft Word is the most popular word processing software in the world.) This tip (7697) applies to Microsoft Word 2007, 2010, 2013, and 2016.

Author Bio

Allen Wyatt

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He is president of Sharon Parq Associates, a computer and publishing services company. ...

MORE FROM ALLEN

PRIVATE Fields in WordPerfect Documents

Did you ever convert a document from WordPerfect and see PRIVATE fields in it? Here's what those fields mean.

Discover More

Creating a Table Using the Keyboard

Want to easily add a table to your document simply by typing a few keystrokes? Here's how you can do it in one easy step.

Discover More

Protecting a Workbook from Opening in Other Programs

When working with sensitive information, you always need to be concerned that your data doesn't fall into unwanted hands. ...

Discover More

Learning Made Easy! Quickly teach yourself how to format, publish, and share your content using Word 2013. With Step by Step, you set the pace, building and practicing the skills you need, just when you need them! Check out Microsoft Word 2013 Step by Step today!

More WordTips (ribbon)

Using Executive Summaries

An executive summary for a longer document may be a nice finishing touch. One of the lesser-known features of Word is ...

Discover More

Displaying Quick Document Statistics

Need to see how many pages, words, paragraphs, or lines are in your document? Word makes it easy to retrieve such ...

Discover More

Turning Off Hyphenation for Individual Words

Word does a semi-decent job when it comes to automatically hyphenating your documents. It even lets you exclude certain ...

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in WordTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 6Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is five more than 2?

2018-07-01 23:56:43

Colin

Thanks for this. Super helpful.

How do you make it so that the hyphenated term is considered one word (e.g. year-end)? As well, how do we include alphanumeric terms?


2018-03-09 09:07:33

Brian

Ken as always you put the icing on the cake with your adaptations of the great macros created by Alan. Both of you are a great asset to the WORD and EXCEL community. Keep up the good work.


2018-03-09 01:38:19

Eric

Hmm. It's saying my word count is like 26k words larger than what Word's automatic word count feature is saying. Which is accurate? (It's a scary thought, because that would make the word count of my novel horribly too high).


2017-07-17 09:35:48

Andrew

Another method I've used is to copy the text of the document (text only paste) into a scratch file, replace all the whitespace with paragraph marks, sort the file (case insensitive sort), and finally delete all duplicate lines using the method from Graham Mayor's page (http://www.gmayor.com/replace_using_wildcards.htm) by wildcard searching for
(*^13)@
and replacing with
\1
to remove the duplicate words.

Andy.


2017-07-14 15:24:02

Max

Excatly what I was looking for. Great solution! Thank you!


2016-10-15 07:40:30

Ken Endacott

Depending on the vocabulary of the author, large documents can have over 5000 unique words. The above macro displays one word per line which means that the list will occupy around 100 pages. The macro below will write the list of words into a new document with five columns per page.

Sub UniqueWordList()
Dim wList As New Collection
Dim wrd
Dim chkwrd
Dim sTemp As String
Dim k As Long

For Each wrd In ActiveDocument.Range.words
sTemp = Trim(LCase(wrd))
If sTemp >= "a" And sTemp <= "z" Then
k = 0
For Each chkwrd In wList
k = k + 1
If chkwrd = sTemp Then GoTo nw
If chkwrd > sTemp Then
wList.Add Item:=sTemp, Before:=k
GoTo nw
End If
Next chkwrd
wList.Add Item:=sTemp
End If
nw:
Next wrd

sTemp = "There are " & ActiveDocument.Range.words.Count & " words " & _
"in " & ActiveDocument.Name & ", but there "
Documents.Add
With ActiveDocument.Styles("Normal")
With .ParagraphFormat
.SpaceBefore = 0
.SpaceAfter = 0
.LineSpacingRule = wdLineSpaceSingle
End With
.NextParagraphStyle = "Normal"
.NoSpaceBetweenParagraphsOfSameStyle = False
.Font.Name = "Times New Roman"
.Font.Size = 10
End With

sTemp = sTemp & "are only " & wList.Count & " unique words."
Selection.TypeText vbCrLf & sTemp & vbCrLf & vbCrLf

Selection.InsertBreak Type:=wdSectionBreakContinuous
With Selection.PageSetup.TextColumns
.SetCount NumColumns:=5
.EvenlySpaced = True
.LineBetween = True
.Width = CentimetersToPoints(2.8)
.Spacing = CentimetersToPoints(0.4)
End With

For Each chkwrd In wList
Selection.TypeText chkwrd & vbCrLf
Next chkwrd
End Sub

If you want to know how often a word appears then the macro UniqueWordCount will list unique words and give a count of how many times each word appears in the document.

Sub UniqueWordCount()
Dim wList As New Collection
Dim wCount As New Collection
Dim wrd
Dim chkwrd
Dim sTemp As String
Dim k As Long
Dim j As Long

For Each wrd In ActiveDocument.Range.words
sTemp = Trim(LCase(wrd))
If sTemp >= "a" And sTemp <= "z" Then
k = 0
For Each chkwrd In wList
k = k + 1
If chkwrd = sTemp Then
j = wCount(k) + 1
wCount.Remove (k)
If k > wCount.Count Or wCount.Count = 1 Then
wCount.Add Item:=j
Else
wCount.Add Item:=j, Before:=k
End If
GoTo nw
End If
If chkwrd > sTemp Then
wList.Add Item:=sTemp, Before:=k
wCount.Add Item:=1, Before:=k
GoTo nw
End If
Next chkwrd
wList.Add Item:=sTemp
wCount.Add Item:=1
End If
nw:
Next wrd

sTemp = "There are " & ActiveDocument.Range.words.Count & " words " & _
"in " & ActiveDocument.Name & ", but there "
Documents.Add
With ActiveDocument.Styles("Normal")
With .ParagraphFormat
.SpaceBefore = 0
.SpaceAfter = 0
.LineSpacingRule = wdLineSpaceSingle
End With
.NextParagraphStyle = "Normal"
.NoSpaceBetweenParagraphsOfSameStyle = False
.Font.Name = "Times New Roman"
.Font.Size = 10
End With

sTemp = sTemp & "are only " & wList.Count & " unique words."
Selection.TypeText vbCrLf & sTemp & vbCrLf & vbCrLf

Selection.InsertBreak Type:=wdSectionBreakContinuous
With Selection.PageSetup.TextColumns
.SetCount NumColumns:=5
.EvenlySpaced = True
.LineBetween = True
.Width = CentimetersToPoints(2.8)
.Spacing = CentimetersToPoints(0.4)
End With

For k = 1 To wList.Count
Selection.TypeText wList(k) & "(" & wCount(k) & ")" & vbCrLf
Next k
End Sub


This Site

Got a version of Word that uses the ribbon interface (Word 2007 or later)? This site is for you! If you use an earlier version of Word, visit our WordTips site focusing on the menu interface.

Newest Tips
Subscribe

FREE SERVICE: Get tips like this every week in WordTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.