Generating a List of Unique Words

Written by Allen Wyatt (last updated April 15, 2023)
This tip applies to Word 2007, 2010, 2013, 2016, 2019, Word in Microsoft 365, and 2021


2

Isao wonders if there is a way to easily construct a list of all the unique words in a document. He doesn't need to know how many times each word appears; he just needs the list of unique words. In addition, uppercase and lowercase variations on the same word should count as the same word.

There is no built-in Word function or tool to do this. However, in VBA you can access the Words collection, which includes all the words in the document. With this in mind, you can create a macro that builds a sorted list of unique words in the document and then adds those words to the end of the document.

Sub UniqueWordList()
    Dim wList As New Collection
    Dim wrd
    Dim chkwrd
    Dim sTemp As String
    Dim k As Long

    For Each wrd In ActiveDocument.Range.Words
        sTemp = Trim(LCase(wrd))
        If sTemp >= "a" And sTemp <= "z" Then
            k = 0
            For Each chkwrd In wList
                k = k + 1
                If chkwrd = sTemp Then GoTo nw
                If chkwrd > sTemp Then
                    wList.Add Item:=sTemp, Before:=k
                    GoTo nw
                End If
            Next chkwrd
            wList.Add Item:=sTemp
        End If
nw:
    Next wrd

    sTemp = "There are " & ActiveDocument.Range.Words.Count & " words "
    sTemp = sTemp & "in the document, before this summary, but there "
    sTemp = sTemp & "are only " & wList.Count & " unique words."

    ActiveDocument.Range.Select
    Selection.Collapse Direction:=wdCollapseEnd
    Selection.TypeText vbCrLf & sTemp & vbCrLf
    For Each chkwrd In wList
        Selection.TypeText chkwrd & vbCrLf
    Next chkwrd
End Sub

Note that each word in the document is extracted, converted to lowercase, and then added to the wList collection, in sorted order. Words are only added if they are alphabetic (thus, numbers are excluded, as is punctuation), and the macro pays no attention to the case of the words. You should also be aware that the macro only looks at words in the main body of the document. It does not include any words in places such as headers, footers, text boxes, or shapes.

The macro could easily be changed to allow for varying needs. For instance, you could have the macro stick the wordlist into a separate document instead of at the end of the current document. All you would need to do is to insert this line before the exiting line shown second here:

    sTemp = "There are " & ActiveDocument.Range.Words.Count & " words "
    sTemp = sTemp & "in " & ActiveDocument.Name & ", but there "
    sTemp = sTemp & "are only " & wList.Count & " unique words."

    Documents.Add
    ActiveDocument.Range.Select
    Selection.Collapse Direction:=wdCollapseEnd
    Selection.TypeText vbCrLf & sTemp & vbCrLf
    For Each chkwrd In wList
        Selection.TypeText chkwrd & vbCrLf
    Next chkwrd
End Sub

Note that there was only one substantive change in the macro: The addition of the "Documents.Add" method to create the new document for the summary.

For some other ideas on getting words out of a document—including macros that tally word frequency—you may want to refer to this tip: Generating a Count of Word Occurrences.

Note:

If you would like to know how to use the macros described on this page (or on any other page on the WordTips sites), I've prepared a special page that includes helpful information. Click here to open that special page in a new browser tab.

WordTips is your source for cost-effective Microsoft Word training. (Microsoft Word is the most popular word processing software in the world.) This tip (7697) applies to Microsoft Word 2007, 2010, 2013, 2016, 2019, Word in Microsoft 365, and 2021.

Author Bio

Allen Wyatt

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He is president of Sharon Parq Associates, a computer and publishing services company. ...

MORE FROM ALLEN

Formatting the Border of a Legend

When you create a chart, Excel often includes a legend with the chart. You can format several attributes of the legend's ...

Discover More

Word and Character Count Information

Using fields you can easily insert both the word and character counts for a document into the document itself. Here's how ...

Discover More

Word Marks Mixed Case Acronyms as Incorrect

Many acronyms (such as DoD) are considered correct when they used mixed uppercase and lowercase. Word may not seem them ...

Discover More

Learning Made Easy! Quickly teach yourself how to format, publish, and share your content using Word 2013. With Step by Step, you set the pace, building and practicing the skills you need, just when you need them! Check out Microsoft Word 2013 Step by Step today!

More WordTips (ribbon)

Word Counts for a Group of Documents

Getting a word count for a single document is easy. Getting an aggregate word count for a large number of documents can ...

Discover More

Ignoring Punctuation in Names

If you have a word that includes punctuation as part of the word itself, then you may be frustrated by how Word treats ...

Discover More

Cross-Reference to a Line Number

Word allows you to add line numbers within a document. However, it does not allow you to reference those line numbers ...

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in WordTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] (all 7 characters, in the sequence shown) in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 6Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is eight less than 8?

2023-04-17 12:40:13

Andrew

Rereading this, I realize there is one obscurity that ought to be explained. The statement WordList(s) = WordList(s) + 1 when operating on a Diction in which the key "s" is not already defined will add that key to the dictionary with the associated item being an empty string (essentially, an uninitialized variant which converts to 0 when subject to the + addition operator.


2023-04-17 12:34:52

Andrew

I used to do this by converting all of the words in a document to single lines, sorting them, and using a wildcard search (replacing
"(*^13)@" with "\1" - a GREAT trick from the WordMVP site). I like this Tips.net approach better, but using a Scripting.Dictionary instead of a Collection makes process MUCH simpler. My new implementation follows. (It takes the Range to be operated on and the Scripting.Dictionary to use as parameters so as to facilitate running feeding all of the different stories of a document through it (and not just the main story). The dictionary keys the number of occurrences of the words themselves - which information is occasionally useful. "Uniqueness" of words is as defined by the Scripting.Dictionary.Exists property and depends on the WordList's .CompareMode property -- no need for the use of LCase since the default .CompareMode is "TextCompare" when creating a dictionary, but this will let differentiate capitalized words from non-capitalized words, which I often have to do.

Sub UniqueWordListFromRange(WordList As scripting.Dictionary, Range As Range)
Dim s As String, r As Range
For Each r In Range.Words
s = RTrim(r) ' Remove possible trailing spaces per https://learn.microsoft.com/en-us/office/vba/api/word.words
If Left(s, 1) Like "[A-Za-z0-9]" Then WordList(s) = WordList(s) + 1
Next r
End Sub

Accessing the list of unique words to add to the end of the document is similarly greatly simplified (and it would be simple work to alphabetize the result):

Sub UniqueWordList2()
Dim WordList As New scripting.Dictionary: WordList.CompareMode = TextCompare
UniqueWordListFromRange WordList, ActiveDocument.Content
ActiveDocument.Content.InsertAfter vbCr & vbCr & "These " & WordList.Count & _
" unique words precede this summary:" & vbCr & vbCr & Join(WordList.Keys, vbCr)
End Sub


This Site

Got a version of Word that uses the ribbon interface (Word 2007 or later)? This site is for you! If you use an earlier version of Word, visit our WordTips site focusing on the menu interface.

Videos
Subscribe

FREE SERVICE: Get tips like this every week in WordTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.