Isao wonders if there is a way to easily construct a list of all the unique words in a document. He doesn't need to know how many times each word appears; he just needs the list of unique words. In addition, uppercase and lowercase variations on the same word should count as the same word.
There is no built-in Word function or tool to do this. However, in VBA you can access the Words collection, which includes all the words in the document. With this in mind, you can create a macro that builds a sorted list of unique words in the document and then adds those words to the end of the document.
Sub UniqueWordList() Dim wList As New Collection Dim wrd Dim chkwrd Dim sTemp As String Dim k As Long For Each wrd In ActiveDocument.Range.Words sTemp = Trim(LCase(wrd)) If sTemp >= "a" And sTemp <= "z" Then k = 0 For Each chkwrd In wList k = k + 1 If chkwrd = sTemp Then GoTo nw If chkwrd > sTemp Then wList.Add Item:=sTemp, Before:=k GoTo nw End If Next chkwrd wList.Add Item:=sTemp End If nw: Next wrd sTemp = "There are " & ActiveDocument.Range.Words.Count & " words " sTemp = sTemp & "in the document, before this summary, but there " sTemp = sTemp & "are only " & wList.Count & " unique words." ActiveDocument.Range.Select Selection.Collapse Direction:=wdCollapseEnd Selection.TypeText vbCrLf & sTemp & vbCrLf For Each chkwrd In wList Selection.TypeText chkwrd & vbCrLf Next chkwrd End Sub
Note that each word in the document is extracted, converted to lowercase, and then added to the wList collection, in sorted order. Words are only added if they are alphabetic (thus, numbers are excluded, as is punctuation), and the macro pays no attention to the case of the words. You should also be aware that the macro only looks at words in the main body of the document. It does not include any words in places such as headers, footers, text boxes, or shapes.
The macro could easily be changed to allow for varying needs. For instance, you could have the macro stick the wordlist into a separate document instead of at the end of the current document. All you would need to do is to insert this line before the exiting line shown second here:
sTemp = "There are " & ActiveDocument.Range.Words.Count & " words " sTemp = sTemp & "in " & ActiveDocument.Name & ", but there " sTemp = sTemp & "are only " & wList.Count & " unique words." Documents.Add ActiveDocument.Range.Select Selection.Collapse Direction:=wdCollapseEnd Selection.TypeText vbCrLf & sTemp & vbCrLf For Each chkwrd In wList Selection.TypeText chkwrd & vbCrLf Next chkwrd End Sub
Note that there was only one substantive change in the macro: The addition of the "Documents.Add" method to create the new document for the summary.
For some other ideas on getting words out of a document—including macros that tally word frequency—you may want to refer to this tip: Generating a Count of Word Occurrences.
Note:
WordTips is your source for cost-effective Microsoft Word training. (Microsoft Word is the most popular word processing software in the world.) This tip (7697) applies to Microsoft Word 2007, 2010, 2013, and 2016.
The First and Last Word on Word! Bestselling For Dummies author Dan Gookin puts his usual fun and friendly candor back to work to show you how to navigate Word 2013. Spend more time working and less time trying to figure it all out! Check out Word 2013 For Dummies today!
Need to make sure that Word is installed correctly from the original installation CDs? You can do it by using a command ...
Discover MoreIf you decide to create a master document, it is easy to do by just adding one or more subdocuments to an existing ...
Discover MoreWord includes several different tools you can use to improve your writing. One such tool is the translation tool. Here's ...
Discover MoreFREE SERVICE: Get tips like this every week in WordTips, a free productivity newsletter. Enter your address and click "Subscribe."
2021-04-27 03:51:28
Gilen
Hi, thanks for this useful maco, it works smoothly, with a snag: it doesn't list words with diacritcs (á, ï...). Is there any easy way to fix that?
2019-03-10 15:01:55
Marcel
How can this macro be modified to generate a list of all the unique words in MULTIPLE documents within a folder (output ONE word list for ALL the documents in that folder) with one word per line and the list in a separate document? (I don’t need the frequency of the words).
Also, is it possible to get the word list directly as a .txt file instead of a .docx file?
2018-07-01 23:56:43
Colin
Thanks for this. Super helpful.
How do you make it so that the hyphenated term is considered one word (e.g. year-end)? As well, how do we include alphanumeric terms?
2018-03-09 09:07:33
Brian
Ken as always you put the icing on the cake with your adaptations of the great macros created by Alan. Both of you are a great asset to the WORD and EXCEL community. Keep up the good work.
2018-03-09 01:38:19
Eric
Hmm. It's saying my word count is like 26k words larger than what Word's automatic word count feature is saying. Which is accurate? (It's a scary thought, because that would make the word count of my novel horribly too high).
2017-07-17 09:35:48
Andrew
Another method I've used is to copy the text of the document (text only paste) into a scratch file, replace all the whitespace with paragraph marks, sort the file (case insensitive sort), and finally delete all duplicate lines using the method from Graham Mayor's page (http://www.gmayor.com/replace_using_wildcards.htm) by wildcard searching for
(*^13)@
and replacing with
\1
to remove the duplicate words.
Andy.
2017-07-14 15:24:02
Max
Excatly what I was looking for. Great solution! Thank you!
2016-10-15 07:40:30
Ken Endacott
Depending on the vocabulary of the author, large documents can have over 5000 unique words. The above macro displays one word per line which means that the list will occupy around 100 pages. The macro below will write the list of words into a new document with five columns per page.
Sub UniqueWordList()
Dim wList As New Collection
Dim wrd
Dim chkwrd
Dim sTemp As String
Dim k As Long
For Each wrd In ActiveDocument.Range.words
sTemp = Trim(LCase(wrd))
If sTemp >= "a" And sTemp <= "z" Then
k = 0
For Each chkwrd In wList
k = k + 1
If chkwrd = sTemp Then GoTo nw
If chkwrd > sTemp Then
wList.Add Item:=sTemp, Before:=k
GoTo nw
End If
Next chkwrd
wList.Add Item:=sTemp
End If
nw:
Next wrd
sTemp = "There are " & ActiveDocument.Range.words.Count & " words " & _
"in " & ActiveDocument.Name & ", but there "
Documents.Add
With ActiveDocument.Styles("Normal")
With .ParagraphFormat
.SpaceBefore = 0
.SpaceAfter = 0
.LineSpacingRule = wdLineSpaceSingle
End With
.NextParagraphStyle = "Normal"
.NoSpaceBetweenParagraphsOfSameStyle = False
.Font.Name = "Times New Roman"
.Font.Size = 10
End With
sTemp = sTemp & "are only " & wList.Count & " unique words."
Selection.TypeText vbCrLf & sTemp & vbCrLf & vbCrLf
Selection.InsertBreak Type:=wdSectionBreakContinuous
With Selection.PageSetup.TextColumns
.SetCount NumColumns:=5
.EvenlySpaced = True
.LineBetween = True
.Width = CentimetersToPoints(2.8)
.Spacing = CentimetersToPoints(0.4)
End With
For Each chkwrd In wList
Selection.TypeText chkwrd & vbCrLf
Next chkwrd
End Sub
If you want to know how often a word appears then the macro UniqueWordCount will list unique words and give a count of how many times each word appears in the document.
Sub UniqueWordCount()
Dim wList As New Collection
Dim wCount As New Collection
Dim wrd
Dim chkwrd
Dim sTemp As String
Dim k As Long
Dim j As Long
For Each wrd In ActiveDocument.Range.words
sTemp = Trim(LCase(wrd))
If sTemp >= "a" And sTemp <= "z" Then
k = 0
For Each chkwrd In wList
k = k + 1
If chkwrd = sTemp Then
j = wCount(k) + 1
wCount.Remove (k)
If k > wCount.Count Or wCount.Count = 1 Then
wCount.Add Item:=j
Else
wCount.Add Item:=j, Before:=k
End If
GoTo nw
End If
If chkwrd > sTemp Then
wList.Add Item:=sTemp, Before:=k
wCount.Add Item:=1, Before:=k
GoTo nw
End If
Next chkwrd
wList.Add Item:=sTemp
wCount.Add Item:=1
End If
nw:
Next wrd
sTemp = "There are " & ActiveDocument.Range.words.Count & " words " & _
"in " & ActiveDocument.Name & ", but there "
Documents.Add
With ActiveDocument.Styles("Normal")
With .ParagraphFormat
.SpaceBefore = 0
.SpaceAfter = 0
.LineSpacingRule = wdLineSpaceSingle
End With
.NextParagraphStyle = "Normal"
.NoSpaceBetweenParagraphsOfSameStyle = False
.Font.Name = "Times New Roman"
.Font.Size = 10
End With
sTemp = sTemp & "are only " & wList.Count & " unique words."
Selection.TypeText vbCrLf & sTemp & vbCrLf & vbCrLf
Selection.InsertBreak Type:=wdSectionBreakContinuous
With Selection.PageSetup.TextColumns
.SetCount NumColumns:=5
.EvenlySpaced = True
.LineBetween = True
.Width = CentimetersToPoints(2.8)
.Spacing = CentimetersToPoints(0.4)
End With
For k = 1 To wList.Count
Selection.TypeText wList(k) & "(" & wCount(k) & ")" & vbCrLf
Next k
End Sub
Got a version of Word that uses the ribbon interface (Word 2007 or later)? This site is for you! If you use an earlier version of Word, visit our WordTips site focusing on the menu interface.
Visit the WordTips channel on YouTube
FREE SERVICE: Get tips like this every week in WordTips, a free productivity newsletter. Enter your address and click "Subscribe."
Copyright © 2022 Sharon Parq Associates, Inc.
Comments