Removing HTML Tags from Text

by Allen Wyatt
(last updated June 26, 2018)

2

Aaron has a document that contains a number of HTML tags, and he would like to remove the tags but maintain the formatting they represent. For instance, if he has a phrase that appears this way, he would like to remove the tags ( and ) but have "a phrase" appear in italics. Aaron is pretty sure this can be done with Find and Replace, but he's not quite sure how to go about it.

You are right, Aaron—you can use Find and Replace to accomplish the removal. The way you would do it is to follow these steps:

  1. Press Ctrl+H. Word displays the Replace tab of the Find and Replace dialog box.
  2. Click the More button, if it is available. (See Figure 1.)
  3. Figure 1. The Replace tab of the Find and Replace dialog box.

  4. Make sure the Use Wildcards check box is selected.
  5. In the Find What box, enter the following: \<i\>([!<]@)\
  6. In the Replace With box, enter the following: \1
  7. With the insertion point still in the Replace With box, press Ctrl+I once. The text "Italic" should appear just below the Replace With box.
  8. Click Replace All.

The code that you enter in the Find What box (step 4) may look a little daunting. All you are telling Word to do is to find the beginning HTML tag () followed by any number of characters and ending with the closing HTML tag (). The very short entry in the Replace With box (step 5) simply says to replace whatever is found with the contents of the first element of the Find What box that is surrounded by parentheses—which just happens to be the text between the two HTML tags.

If you want to eliminate the need to remember (or look up) the contents of the Find What box all the time, you can place the Find and Replace operation into a macro:

Sub ConvertItalicTags()
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    Selection.Find.Replacement.Font.Italic = True
    With Selection.Find
        .Text = "\<i\>([!<]@)\"
        .Replacement.Text = "\1"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
End Sub

Assign the macro to a shortcut key, and you can remove the italic HTML tags anytime you need. You could also expand the macro to make similar changes relative to other HTML tags you may need to remove. You may even want to make sure that alternate tags are dealt with. For instance, HTML uses both and tags to display information in italic, which means you should account for the possibility of both sets of tags in your macro.

Of course, there is an entirely different approach you could use to get rid of the HTML tags and still retain the formatting associated with those tags. That would be to save the HTML-encoded text into a text file, open it in your browser, copy the text within the browser window, and paste it directly into a Word document. If all goes well, you would have the desired formatted text in your finished document.

Note:

If you would like to know how to use the macros described on this page (or on any other page on the WordTips sites), I've prepared a special page that includes helpful information. Click here to open that special page in a new browser tab.

WordTips is your source for cost-effective Microsoft Word training. (Microsoft Word is the most popular word processing software in the world.) This tip (10308) applies to Microsoft Word 2007, 2010, 2013, 2016, 2019, and Word in Office 365.

Author Bio

Allen Wyatt

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He is president of Sharon Parq Associates, a computer and publishing services company. ...

MORE FROM ALLEN

Determining the Length of a Text File

When processing plain text files in a macro, it is often helpful to know how much data the file contains. The normal way ...

Discover More

Using Chart Titles

Titles can be a great addition to any chart. They help provide explanatory information about the information in the ...

Discover More

Rounded Table Edges

Tables can be a great addition to many documents, as they allow you to arrange and present information in a clear and ...

Discover More

Create Custom Apps with VBA! Discover how to extend the capabilities of Office 2013 (Word, Excel, PowerPoint, Outlook, and Access) with VBA programming, using it for writing macros, automating Office applications, and creating custom applications. Check out Mastering VBA for Office 2013 today!

More WordTips (ribbon)

Searching for Special Characters

When using the Find and Replace feature of Word, you can search for more than plain text. You can also search for ...

Discover More

Saving Find and Replace Operations

Want to repeat the same Find and Replace operation over and over again? Here are a couple of ways you can improve your ...

Discover More

Changing the Formatting of All Instances of a Word

Need to find all the instances of a particular word and change the formatting of those instances? It's easy to do using ...

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in WordTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 6Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is 9 - 8?

2019-08-14 02:39:19

Ken

Franci

You don't have a matching pair of round brackets.


2019-08-13 05:33:48

Franci

HI!
Using \1 in the Replace With box gives me a "Replace with text contains a group number which is out of range" Error Message. What am I doing wrong?


This Site

Got a version of Word that uses the ribbon interface (Word 2007 or later)? This site is for you! If you use an earlier version of Word, visit our WordTips site focusing on the menu interface.

Newest Tips
Subscribe

FREE SERVICE: Get tips like this every week in WordTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.