Extracting Text Box Contents

Written by Allen Wyatt (last updated April 22, 2023)
This tip applies to Word 2007, 2010, 2013, 2016, 2019, Word in Microsoft 365, and 2021


1

Larry has a 200-page document, with each page containing a text box with text. He would like to copy the contents of all the text boxes to a new document without needing to manually extract the text from each one by one. He wonders if this can be done easily.

If the text boxes are in the main body of your document, you might try to use the searching capabilities of Word to extract the text. First, though, create a brand new document; this will be where you end up pasting the text.

Now, switch back to your original document and do a bit of analysis on the text within the text boxes. I find it helpful to figure out if the text is using a common style; in my case, I noticed that each paragraph within my text boxes used the Normal style.

Now, click somewhere in your document's main body, outside of any text boxes. Then, follow these steps:

  1. Press Ctrl+H. Word displays the Replace tab of the Find and Replace dialog box.
  2. Display the Find tab of the dialog box.
  3. If the More button is available, click on it to expand the dialog box.
  4. With the insertion point within the Find What box, click the Format button and choose the Style option. Word displays the Find Style dialog box.
  5. Select the Normal style.
  6. Click on OK. This takes you back to the Find and Replace dialog box.
  7. Click the Find In button and then choose Text Boxes in Main Document. (If this option is not available, it means either there are no text boxes in your main document, or the insertion point was not within the main document when you started these steps.) A this point all the text within your text boxes should be selected.
  8. Close the Find and Replace dialog box. The text in the text boxes should still be selected.
  9. Press Ctrl+C. This copies the selected text to the Clipboard.
  10. Switch to the new document you created earlier.
  11. Press Ctrl+V. Word pastes the contents of the Clipboard (the copied text) to the new document.

If your text boxes don't use the Normal style for their text, all you need to do is figure out what common attribute it does use and then specify that attribute in steps 4 through 6.

These steps, as mentioned, work great if your text boxes are in the main body of your document. It is possible for text boxes to also be in other places, such as headers, footers, or footnotes. In addition, if the text in your text boxes doesn't share some common attribute that you can discern, then the steps won't produce a satisfactory result.

In this case, the only real way that we've found to do extract the text is to use a macro. The following is a rather simple one that adds a new document and then steps through each story in the source document. (A story is a portion of the document such as headers, footers, footnotes, endnotes, main body, etc. Since text boxes could be in each of these, it makes sense to process each story.) It then looks at all the shapes in the story and, if the shape is a text box, it then copies the text to the sText string. This is then "typed" into the new document.

Sub XferTextBoxContents()
    Dim Source As Document
    Dim stry As Range
    Dim shp As Shape
    Dim sText As String

    Set Source = ActiveDocument
    Documents.Add DocumentType:=wdNewBlankDocument
    ' The newly added document is now the ActiveDocument

    For Each stry In Source.StoryRanges
        For Each shp In stry.ShapeRange
            If shp.Type = msoTextBox Then
                ' Copy text to string, without last paragraph mark
                sText = Left(shp.TextFrame.TextRange.Text, _
                  shp.TextFrame.TextRange.Characters.Count - 1)
                If Len(sText) > 0 Then
                    Selection.TypeText Text:=sText
                    Selection.TypeParagraph
                End If
            End If
        Next shp
    Next stry
    Source.Activate
End Sub

The macro doesn't change the original document, and when it is completed, the new document will contain only the text that was in the original's text boxes.

There are a few things you should note about using a macro such as this. First, how text boxes appear within the original document doesn't reflect how they are actually stored and accessed within a macro. For instance, let's say that you have several different sections in your source document, and each has a header and footer, and each header and footer contains a text box. When you look at the document on the screen, the text boxes in the header may appear higher on the page than the text boxes in the footer, and there may be text boxes in the main body of the document that appear between those.

The macro, however, steps through each story in the document and then processes each text box within those stories. This means that the text boxes for all the headers may appear in the target document before all those from the footers, and they may be followed by the text boxes from the main body of the document. The bottom line is that you should not expect the "order" of the text in the target document to match the apparent order you may see in the source document.

The upshot of this realization is that if your original document was created by a program—for instance, a PDF to Word document converter—that program could have tried to maintain the appearance of the original PDF document by sticking everything within a bunch of text boxes. I've seen some converter programs that place every line or even every word into a separate text box. Run the macro on such a document, and you may be dissatisfied with what is created in the new target document. If that is the case, the only potential solution is to grab the original PDF and use a different, higher-quality converter program that doesn't rely so heavily on text boxes.

WordTips is your source for cost-effective Microsoft Word training. (Microsoft Word is the most popular word processing software in the world.) This tip (9755) applies to Microsoft Word 2007, 2010, 2013, 2016, 2019, Word in Microsoft 365, and 2021.

Author Bio

Allen Wyatt

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He is president of Sharon Parq Associates, a computer and publishing services company. ...

MORE FROM ALLEN

Increasing the Capacity of AutoCorrect

AutoCorrect can be a great tool to, well, "correct" information that you type. If you get a little creative, you can even ...

Discover More

Following a Number with Different Characters

When creating numbered lists, the normal characters that follow the number are a period and a tab. Here's how to force ...

Discover More

Paragraph Numbers in Headers or Footers

If your documents routinely use numbered paragraphs, you may want to place the number of the page's first paragraph in ...

Discover More

Comprehensive VBA Guide Visual Basic for Applications (VBA) is the language used for writing macros in all Office programs. This complete guide shows both professionals and novices how to master VBA in order to customize the entire Office suite for their needs. Check out Mastering VBA for Office 2010 today!

More WordTips (ribbon)

Adding a Drop Shadow to a Text Box

Drop shadows add a nice touch to text boxes, making it seem like they are hovering above the page. Here are the simple ...

Discover More

Using Non-Printing Text Boxes

Text boxes can be helpful for segmenting information from your main document and for creating unique page layouts. What ...

Discover More

Removing Text Boxes but Saving the Text

Text boxes can be handy when it comes to noting information in a document or dealing with some tricky layout issues. If ...

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in WordTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] (all 7 characters, in the sequence shown) in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 6Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is two more than 7?

2023-04-23 23:05:44

Tomek

Quite clever approach to extract content of text boxes. Also, it keeps the order of text from boxes almost consistent with their order in the original document. It may switch the order for the text boxes originating from the same page, if those were not created in sequence, but it seems to at least keep the boxes from the same page as consecutive ones.
I did not try boxes from headers and footers and the macro approach but I think this exceeded Larry's request.

Why am I commenting on this. Because I tried to provide help for this tip and failed. I tried to select all text boxes by macro (easy to do) copy them and paste them into a new document. What I got was a tangled mess of text boxes overlapping, even though they had been set to disallow overlap. When I untangled them (by converting to in-line objects) they were at some illogical order, mixing boxes originating from different pages. The order did not follow the numeric ID of the boxes nor was it sorted by their names.


This Site

Got a version of Word that uses the ribbon interface (Word 2007 or later)? This site is for you! If you use an earlier version of Word, visit our WordTips site focusing on the menu interface.

Videos
Subscribe

FREE SERVICE: Get tips like this every week in WordTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.