Pinyin Macros for MS Word and Excel
FAQs & Technical Notes
Tone Placement RulesNeutral ToneUnicode Characters
Word VBA MacroExcel VBA MacroPowerPoint VBA Macro
OpenOffice/LibreOffice, Python, MacOffice 2008
The Rules for Placing Pinyin Tone Marks
It's easy to know where to place the tone mark when there is only one vowel: over that vowel. When there is more than one vowel then it always appears over the first vowel, unless the first vowel is "i", "u" or "ü" in which case the tone mark goes over the second vowel. The letters "y" and "w" don't count as vowels under these rules.
To confirm these rules and learn more, see the Wikipedia Pinyin article section on rules for placing Pinyin tone marks, and even more Pinyin info at where else but Pinyin.info. To start work on these macros I had to look up the tone mark rules myself, and now that I've got my macros done I'm prone to forget it again...
I'm not entirely certain the rules for Pinyin tone mark placement were ever carefully and systematically taught to us in my US university Chinese courses. Maybe it's just my middle-aged memory. I don't have an old copy of my original DeFrancis textbooks to verify that we were not taught this. I do recall that we were never quizzed on it, but my professors knew exactly where to put the diacritics when asked.
What About the Neutral or "Fifth" Tone?
The neutral tone is often indicated by a dot over the vowel in language teaching, but this tone mark does not appear in standard dictionaries or other mainstream print usage of Pinyin. It is not entirely necessary outside of school, but the main problem is that the mark is difficult to show in print.
The numeric representation of this tone is sometimes "0" and sometimes "5". I could have the macros convert either one to the same neutral tone, but the character sets included in the fonts bundled with Office do not offer what we need for every letter. In fact, I'm not certain any one font includes them all. For example, in your Insert Symbols dialog you will find that for "a" there is "å" but the "e" would look like "ė". Although an "a" with a small dot above it exists in Unicode it is outside the range of most fonts, and most web browsers for that matter. And so on.
I have seen some teachers in Taiwan use a "middle dot", also known as an "interpunct". To preserve the integrity of the character it must be placed in front of the entire syllable, like this: "·ma". The full width version of this same dot is also used to separate Chinese characters in transliterated foreign or minority language names (乔治 · 华盛顿). It often shows up there in the narrow-width version (乔治·华盛顿) and it is also used in punctuation.
Modifying the macro to place a middle dot in front of the initial consonant in every possible syllable would require more lines of code than you may realize, so I'm loath to spend time on this given that I've never received a request for a middle dot (and only a couple questions about the dot above the vowel as well). Let me know what you think.
As I mentioned above, you can insert many of these from the Symbols list in Word, Excel and many other applications. There is also a predefined keyboard shortcut for the "å" (alt-0229 on the number pad with NumLock engaged), but for "ė" and most others you must assign custom shortcuts or use the Symbols list.
The Uncode Consortium website offers tables of all Unicode characters. I also recommend FileFormat.info. Although as of this writing FileFormat.info is not perfectly indexed, when it works that site is much more user-friendly for seeking the codes for Unicode, macros, html, Microsoft keyboard shortcuts or many other purposes.
On FileFormat you will find nicely laid-out tables and representations of the characters in both text and images. There is some jargon to learn, however. Here are examples using the the letter "a":
- ā - Unicode Character 'LATIN SMALL LETTER A WITH MACRON' (U+0101)
- á - Unicode Character 'LATIN SMALL LETTER A WITH ACUTE' (U+00E1)
- ǎ - Unicode Character 'LATIN SMALL LETTER A WITH CARON' (U+01CE)
- à - Unicode Character 'LATIN SMALL LETTER A WITH GRAVE' (U+00E0)
"Grave" is pronounced "grahv", I've learned. The Unicode number is on the end in parantheses.
Making the Word Macro in VBA
Macros are all about taking care of repetitive actions for you, and my plan required exactly that: finding and moving numbers repeatedly until they are next to the vowel that should be replaced, then replacing that vowel and the number with the appropriate Unicode character.
Having studied various versions of Basic in undergrad and grad school (along with Pascal, and some languages so long out of use I won't even mention them...), I felt confident I could figure this out quickly. Riiight. Humbled but not deterred with what I found while grazing through free information available on the MSDN site, plus various forums and blogs, I realized there were many new conventions to learn and this wasn't going to be a simple if/then thing. I decided to structure the macro as a long list of With statements, and then I set about learning what functions to use for find and replace.
Nothing ever being simple, Visual Basic's Unicode-to-character conversion function (ChrW) requires a prior conversion from Unicode notation, replacing the "U+" and all leading zeros with "&H" to indicate that the remaining number is hexadecimal:
- ā - ChrW(&H101)
- á - ChrW(&HE1)
- ǎ - ChrW(&H1CE)
- à - ChrW(&HE0)
Therefore this relatively simple macro winds up being a long list of statements like this:
.Forward = False
.Wrap = wdFindContinue
.Text = "a1"
.Replacement.Text = ChrW(&H101)
Word automatically converts upper case letters by itself, so I only had to give it the lower case code points and did not have to look up the capital letter versions, or worry about initial vowel mixed-case situations like "Ai4" or "Ou1yang2". I would have no such luck when I finally turned to the Excel version.
Making the Excel Macro in VBA
I found myself eating more humble pie when trying to recreate this macro in Excel. It was not going to be as simple as pasting the Word macro in and hitting Run. Although the Word version will, without asking, convert selected text or else all the text behind the cursor, Excel is an entirely different animal and needs to be told whether to convert specific cells, a worksheet or an entire workbook.
Another problem, as I mentioned above, was that although Word automatically converts to upper-case letters when it sees them, Excel needs to be provided specific code points for each character in both upper- and lower-case. This required not only looking up the code points for the capital letter versions and doubling the number of sorting and replacement statements, but I also had to think through all the possible initial vowel mixed-case situations:
Is that all of the possible "proper case" initial double vowel situations? Let me know if I missed any.
I then had to decide if I really wanted to learn more about the Visual Basic for Applications commands necessary for Excel. I use Excel in all sorts of MBA-type activities, but I had no real interest in learning more VBA just to create this one freebie.
Instead I got some help from two excellent Excel macro mavens. Tushar Kapila wrote the first version in 2008, and Jin Fei in China helped write version 2.0 in 2012. Please contact me for technical support with this macro, but I do highly recommend those guys for VBA and other dev work.
In the first version, Tushar converted my Word macro's With statements to Selection.Replace. I took it from there, changed MatchCase to True, and went through duplicating all those statements for upper case letters. Lastly, I carefully added sort and replace statements for proper case double-vowel situations like "Ou".
A few years later, a helpful user named Shirish Pandit suggested some Excel macro code for adding the u-with-diaresis (umlaut) to support those with German keyboards. Other users can continue using the letters "V" and "v" of course. (See this page if you don't have a German keyboard and want to type the letter "ü".)
The current version was completely rewritten by Jin Fei, adding improvements based on user feedback and using the latest in efficient VBA coding.
Did we get it right? Please let me know how this new macro is working for you, via my contact page.
A PowerPoint VBA macro lands in my lap
I received only a few requests for a PowerPoint version of the macro over the first ten years after posting the original macro, and never got around to learning how to customize it for PowerPoint. But in 2015 a user who wishes to remain anonymous sent a draft macro unasked!
After a bit of back-and-forth hashing out user requirements and applying the lessons learned in developing the Word and Excel versions, we had a macro with three modules that will convert an entire document, selected slides only, or selected objects/textboxes only. This is great!
OpenOffice, LibreOffice, Python, and AppleScript
LibreOffice and OpenOffice Write and Calc are now supported, thanks to Carlos Fruitos who ported the VBA macros to StarBasic. Those files are now posted on the macro page. If you'd like to help create macros for Impress or other apps in this suite, please volunteer.
My macro has also been ported to python by a former colleague of mine for his MoinMoin parser. You can find Robert Yu's Pinyin python parser here.
Mac Office 2008 requires AppleScript and does not support VBA macros, nor do the iWorks applications Pages and Numbers. Mac Office 2011 supports VBA again, which is good because I really didn't want to learn AppleScript just for this.
If a Pinyin Macro won't run, please delete it and try again. That usually solves the problem. Here is what to watch out for:
- Be very careful to follow my instructions (in the macro comments) about the "Sub" and "End sub" lines that are automatically created when you first open a blank macro. If the macro doesn't work the first time, delete it and paste the code in again.
- Copy the macro directly from my text file, without new formatting. Any app that adds formatting may add line breaks and line spacing that confuses the program. This often happens to Mac users. Please be careful.
- If the above suggestions don't solve the problem and you are getting a "compile error", there may be something wrong with your copy of Word's "normal.dot" file. If so, please contact me and I'll try to help further.
Fonts: I list the fonts that include the necessary Unicode characters on the main Pinyin macro page, and have never received a question about that. But I can't stress enough: if you use Office 2007's default Calibri font, some of the letters will display in Arial and your words will look very strange.
Security: Enabling macros requires more and more bravery due to the increasingly strident warnings each new version of Word and Excel throw at you, as if Microsoft is driven by some kind of PTSD hyper-vigilance caused by past battles against the macros of evildoers. But even a quick glance at the contents of my macros seems to set everyone at ease. They just move and replace a few letters and numbers. Excel is particularly bad about giving scary warnings, but just follow the instructions in any dialogs that pop up or any Help instructions you need to read, and you'll be fine.
Excel / Calc formulas: Be very careful not to select cells with formulas in them, because the macro will also convert variables that look like Pinyin. For example, in "=a1+b1" the "a1" will get converted. This is why the current version only works on selected cells. The old versions that did entire sheets or whole workbooks caused a few disasters for some people.
Compatibility: I've tested these macros in the Windows versions of Office 365, 2013, 2010, 2007, 2003, XP/2002 and 97, and in Macintosh Office 98 through 2004, plus Mac Office 2011 / 365. These macros might even work in the mid-nineties Mac version 4.2, but Mac Office 2008 is not compatible because it does not support VBA macros. As far as I know it's working in all LibreOffice and OpenOffice versions too. Please send questions, comments or suggestions anytime via the contact page.