Loading
WordRibbon.Tips.Net WordTips (Ribbon Interface)

Character Frequency Count

Scott is looking for a way to get a "frequency count" of all the characters in a document. He would like to know how many times each character, ASCII codes 9 through 255, occur. It is possible to use Find and Replace to determine the count of individual characters (simply search for a character in question and then replace it with itself), but such an approach would be tedious, at best, if you needed to do it for 247 different character codes to get the desired information.

Such a task must be done with a macro, but there are several ways to approach it. One way is to write a quick macro that will step through each member of the Character collection, examining each, and assigning that character to one of a number of counters.

Sub CountChars1()
    Dim iCount(0 To 255) As Integer
    Dim i As Integer
    Dim vCharacter As Variant
    Dim sTemp As String

    ' Initialize the array
    For i = 0 To 255
        iCount(i) = 0
    Next i

    ' Fill the array
    For Each oCharacter In ActiveDocument.Characters
       i = Asc(oCharacter)
       iCount(i) = iCount(i) + 1
    Next

    ' Add document for results
    Documents.Add
    Selection.TypeText Text:="ASCII Character Count" & vbCrLf

    ' Only output codes 9 through 255
    For i = 9 To 255
        sTemp = Chr(i)
        If i < 32 Then sTemp = Trim(Str(i))
        sTemp = sTemp & Chr(9) & Trim(Str(iCount(i)))
        sTemp = sTemp & vbCrLf
        Selection.TypeText Text:=sTemp
    Next i
End Sub

The macro uses the iCount array to accumulate the counts of each character code, and then a new document is created to output the results. (The results document can be formatted in any way desired.)

This approach can work well for relatively short documents, up to a few pages. When the document gets longer, the macro gets slower. Why? Because it takes a great deal of time to use the Characters collection for some reason. If the macro runs too slow for your documents, then you will want to change it a bit so that it works solely with strings.

Sub CountChars2()
    Dim iCount(0 To 255) As Long
    Dim i As Long
    Dim j as integer
    Dim lCharCount As Long
    Dim sDoc As String
    Dim sTemp As String

    ' Initialize the array
    For i = 0 To 255
        iCount(i) = 0
    Next i

    ' Assign document to a huge string
    lCharCount = ActiveDocument.Characters.Count
    sDoc = ActiveDocument.Range(0, lCharCount)

    ' Fill the array
    For i = 1 to Len(sDoc)
       j = Asc(Mid(sDoc, i, 1))
       iCount(j) = iCount(j) + 1
    Next

    ' Add document for results
    Documents.Add
    Selection.TypeText Text:="ASCII Character Count" & vbCrLf

    ' Only output codes 9 through 255
    For i = 9 To 255
        sTemp = Chr(i)
        If i < 32 Then sTemp = Trim(Str(i))
        sTemp = sTemp & Chr(9) & Trim(Str(iCount(i)))
        sTemp = sTemp & vbCrLf
        Selection.TypeText Text:=sTemp
    Next i
End Sub

Notice that this version of the macro stuffs the entire document into a single string, sDoc. This string can then be processed very, very quickly by the macro. (A 635-page document only took about 30 seconds to process on my system.) Because this version is made to work with longer documents, note as well that some of the variable types have been changed to reflect the likelihood of larger counts.

WordTips is your source for cost-effective Microsoft Word training. (Microsoft Word is the most popular word processing software in the world.) This tip (112) applies to MS Word versions: 2007 | 2010

You can find a version of this tip for the older menu interface of Word here: Character Frequency Count.

Related Tips:

Take Control! Master the real power behind Word! Successfully master the secrets of powerful formatting and create documents that stand out from the rest. Best of all, you can create documents that are easy to maintain and quick to change. Check out WordTips: Styles and Templates today!

 

Comments for this tip:

Roy    30 Oct 2011, 14:02
Some people will want a letter frequency count and that would be only higher and lower case characters.
For English,these would be ASCII codes 65 to 90 for capitals and 97 to 122 for lower case.
This would mean two loops but probably quicker.
Roy.

Leave your own comment:

*Name:
Email:
  Notify me about new comments ONLY FOR THIS TIP
Notify me about new comments ANYWHERE ON THIS SITE
Hide my email address
*Text:
*What is 2+3? (To prevent automated submissions and spam.)
 
 
 

Our Company

Sharon Parq Associates, Inc.

About Tips.Net

Contact Us

 

Advertise with Us

Our Privacy Policy

Our Sites

Tips.Net

Beauty and Style

Cars

Cleaning

Cooking

ExcelTips (Excel 97–2003)

ExcelTips (Excel 2007–2013)

Gardening

Health

Home Improvement

Money and Finances

Organizing

Pests and Bugs

Pets and Animals

WindowsTips

WordTips (Word 97–2003)

WordTips (Word 2007–2013)

Our Products

Premium Newsletters

Helpful E-books

Newsletter Archives

 

Excel Products

Word Products

Our Authors

Author Index

Write for Tips.Net

Copyright © 2013 Sharon Parq Associates, Inc.