How to use the VBA Code in Excel to Remove the Foreign Language Characters?
Remove Foreign Language Characters with VBA Code is useful when you have Excel reports having both English and Foreign language text. To remove the foreign text, you must manually select each cell and remove the foreign text to be able to use the reports. This makes the whole process lengthy. We can remove the foreign text using the VBA code provided below.
In this post, I will explain to you how to use VBA Code in Macro, provided in the following example, and use it to Remove Foreign Language Characters.
Practical applications of Remove Foreign Language Characters with VBA
Suppose you have a policy sales report of agents in both English and Chinese Language and you want to remove the Chinese Characters from the Excel File.
Step 1: Save your Excel file as Macro Enabled Workbook.
Step 2: Enable Developer Tab in your workbook. Go to File>Options>Customize Ribbon>From Drop Down Select Main Tab>Developer Tab>Add>OK
Step 3: Go to Developer Tab>Visual BasicStep
Step 4: Go to Insert Tab>Module.
Step 5: Copy the VBA code below and Paste it on Macro Visual Basic Module.
Step 6: Press F5 or the Play button to run the task and remove the foreign language characters as shown in below.
Explanation of the VBA Code
The VBA code shown above is further explained below for the users to understand how the VBA Syntax works to remove the Foreign Language Characters.
Dim rng As Range
The first layer defines the dimensions of the values in the worksheet. The dim term refers to Dimension here. To make it simple to understand, suppose you have an excel sheet containing alphabets and numbers. If we want to select only numbers, we will set the dimension range as “Integer”. In this case, we are using the entire range and not specifying what the contents of the excel should be because we want to separate both languages so we will try to select both. This is also to ensure that if there are any numbers present both in English and foreign text, we can keep the English numbers and remove the foreign ones.
The second layer is used to generate a visual basic script for regular expressions. Visual basic has a set of regular expressions stored, for example, in case of English language, it must have all the alphabets stored from A to Z in both upper case and lower-case format. It also must have numbers and punctuations. We are using With statement with Create Object. In VBA, with statements can be used to perform a series of statements on a specified object. The Object here is to run VB script for regular expressions. We will define the specific parts of the regular expressions that we want to extract in the next layers. This will help us define regular expressions of English language so that all other foreign text is removed.
.Global = True
The third layer is the Global setting to define how the regular expression is to be processed on the worksheet, that is, whether the operation is to be processed for all the values or only on the first match. By default, it is set to False which means that when we process the code, it will only remove the first foreign character on each cell. Therefore, it must be changed to True so that it searches for all the possible values and update them.
.Pattern = “[^\d^\w^\s-\.\,]”
The fourth layer is where we are required to define the pattern so that only the English language is detected and all else is removed. Let us look at all the patterns in the code.
The “circumflex symbol” which is usually used as mathematical exponential calculation, is used inside square brackets to look for all the other values mentioned in the square bracket. If you remove the symbol, it will remove the English Text and we will be left with the foreign one. It is necessary to have this Symbol in the code as it ensures that all the English Text is searched for.
The “back slash lower case d” symbol is used to match any single decimal digit in the range such as the number five. We will use this Syntax to find numbers accepted in the English Text.
The “back slash lower case w” symbol is used to match any alphanumeric character in the range such as the letter a. We will use this Syntax to find alphabets accepted in the English Text.
The “back slash lower case s minus” symbol is used to ignore any white spaces or tabs such as inverted commas or hyphens. We will use this Syntax to avoid any unnecessary punctuation in the text.
The “back slash lower case full stop” symbol is used to match text relating to full stop symbol. We use it to get this punctuation at the end of complete sentences.
The “back slash lower case comma” symbol is used to match text relating to comma symbol. We use it to get this punctuation in mid of sentences.
The complete Syntax means that we are instructing Excel to select all the numbers, alphabets and full stop and comma punctuation of English Text and ignore the rest.
For Each rng In Range(“A1”).CurrentRegion
The fifth layer is instructing Excel to select all the cells containing both texts in the workbook denoted by Current Region Syntax. We are using this because we want to select these cells and then replace them just like in the Find and Replace function.
rng = .Replace(rng, “”)
The sixth layer instructs Excel to replace the cells selected in the fifth layer with values following pattern defined in the fourth layer.
The seventh layer is simply instructing Excel to repeat the find and replace function until all the cells are replaced.
The Last layer is to close the With Statement we started in the second layer to complete the code.
Now that we understand how the Syntax in the code works, we can use it to extract the English Language Text from Excel files containing both the English Language and the foreign language text. You can also remove the circumflex symbol in the pattern description to get the foreign language text instead of the English language.
I hope that helps. Please leave a comment below with any questions or suggestions. For more in-depth Excel training, checkout our Ultimate Excel Training Course here. Thank you!