Unicode Characters converted to ASCII string
I hacking together a report today and discovered the Unicode text I received was actually in Unicode not ASCII.
Basically I have this: こんにちは
By using AscW(Char) you can convert a Unicode character into an integer value. Add some delimiters to encode the string and you have a Unicode HTML Entity Reference. It isn’t perfect, as AscW(Char) sometimes returns a negative number, which isn’t allowed, though this is an easy work around explained here. It is used below.
Public Function UnicodeToAscii(sText As String) As String Dim x As Long, sAscii As String, ascval As Long If Len(sText) = 0 Then Exit Function End If sAscii = "" For x = 1 To Len(sText) ascval = AscW(Mid(sText, x, 1)) If (ascval < 0) Then ascval = 65536 + ascval ' http://support.microsoft.com/kb/272138 End If sAscii = sAscii & "&#" & ascval & ";" Next UnicodeToAscii = sAscii End Function
Now lets go the other way: ASCII string to Unicode
And I want this: こんにちは
I remembered that ChrW(int) will convert character codes to their associated character. I really wasn’t in the mood to write parsing logic and test it, but luckily I came across a class which does this. I ripped out the method I needed and it worked great in all it’s simplicity. I have included this function below:
Public Function AsciiToUnicode(sText As String) As String Dim saText() As String, sChar As String Dim sFinal As String, saFinal() As String Dim x As Long, lPos As Long If Len(sText) = 0 Then Exit Function End If saText = Split(sText, ";") 'Unicode Chars are semicolon separated If UBound(saText) = 0 And InStr(1, sText, "&#") = 0 Then AsciiToUnicode = sText Exit Function End If ReDim saFinal(UBound(saText)) For x = 0 To UBound(saText) lPos = InStr(1, saText(x), "&#", vbTextCompare) If lPos > 0 Then sChar = Mid$(saText(x), lPos + 2, Len(saText(x)) - (lPos + 1)) If IsNumeric(sChar) Then If CLng(sChar) > 255 Then sChar = ChrW$(sChar) Else sChar = Chr$(sChar) End If End If saFinal(x) = Left$(saText(x), lPos - 1) & sChar ElseIf x < UBound(saText) Then saFinal(x) = saText(x) & ";" 'This Semicolon wasn't a Unicode Character Else saFinal(x) = saText(x) End If Next sFinal = Join(saFinal, "") AsciiToUnicode = sFinal Erase saText Erase saFinal End Function
I didn’t always understand why you wouldn’t just want to work with the Unicode characters themselves. Well is seems that not all applications treat Unicode the same way and the characters may be changed. If you are storing and passing around a text representation of the characters there is no way for them to be misinterpreted.
One of the neatest things I like about this is that I can just put the text represented Unicode in a web page and the browser will automatically convert it to Unicode characters. This is the reason I needed to use an image above to show what the text represented Unicode looks like. If I just put the string there, it is converted by the browser when displayed.
If you have been to this post in the past, you have probably noticed that it has changed a bit. That is because I had it all backwards! Yeah well it happens. I said I want wanted to change Unicode characters to Ascii string, but the code actually was for the other way around. Well I finally got around to fixing this and made sure that code worked before displaying it. I hope this helps someone out there.
15 comments
Comments feed for this article
Friday, February 10, 2012 at 7:07:27
radu
And how do you do it the other way around?
Wednesday, February 15, 2012 at 19:07:12
Brettski
That is a very good question. I will have to look that up. I know I have it around here somewhere.
Wednesday, February 15, 2012 at 19:07:22
Brettski
Dim strOut as String
strOut = StrConv("& #x3053;& #x3093;& #x306b;& #x3061;& #x306f;", vbUnicode)
I put a space between the & and # to keep the browser from converting the string back to Unicode characters in this comment.Yeah, this doesn’t really work either. The post has been updated and both methods have been tested to work.
Wednesday, March 28, 2012 at 5:05:27
Scafloc
??? the function doesn’t do anything?
Wednesday, March 28, 2012 at 21:09:59
Brettski
How’s that? It will take a string of Unicode text and covert it into a string of Unicode characters.
Saturday, August 4, 2012 at 2:02:01
Brettski
You were right, it didn’t do anything as I had it all backwards. The function in the post was to convert Unicode strings to Unicode characters, though I presented it as it worked the other way around. Well, we all make mistakes. I have updated the post and now have both methods available.
Thursday, May 10, 2012 at 8:08:33
Yves
Thank you for sharing this code. I will try to use it while reading Belgian eID info to an Access DB. I noticed that some special characters are displayed wrong after importing eID info.
Monday, November 19, 2012 at 3:03:58
mr.Chinh
If i want to set proper case to the string. How should I do?
For example: “THIS IS MY OWN WORLD” -> “This Is My Own World”
Monday, November 19, 2012 at 21:09:20
Brettski
My suggestion is to write a title case function. There are plenty examples using a Google search. It’s a pretty simple function.
Wednesday, March 6, 2013 at 2:02:31
anshu
Yeeeeeeeeeeeh it works u r gr8 dude……..
Wednesday, October 2, 2013 at 11:11:46
Andrew MacKillop
Hi Brettski,
Thanks for your function – works great. I made a couple of changes to not convert acceptable characters to html entities (eg: a-z etc) as the output from your version can be very long which wasn’t great for storing in our database and thought I would share my changes with you and anyone else who might find it useful. The output is destined for a website, hence the html reserved characters section.
Public Function UnicodeToAscii(sText As String) As String
Dim x As Long, sAscii As String, ascval As Long
If Len(sText) = 0 Then
Exit Function
End If
sAscii = “”
For x = 1 To Len(sText)
ascval = AscW(Mid(sText, x, 1))
If (ascval < 0) Then
ascval = 65536 + ascval ' http://support.microsoft.com/kb/272138
ElseIf (ascval < 32) Then ' non print characters
sAscii = sAscii & "&#" & ascval & ";"
ElseIf (acval = 34 Or acval = 39 Or acval = 38 Or acval = 60 Or acval = 62) Then 'reserved characters " ' &
sAscii = sAscii & “&#” & ascval & “;”
ElseIf (ascval > 127) Then ‘unicode
sAscii = sAscii & “&#” & ascval & “;”
Else ‘ acceptable text characters
sAscii = sAscii + Mid(sText, x, 1)
End If
Next
UnicodeToAscii = sAscii
End Function
Thanks again,
Andy
Friday, October 18, 2013 at 17:05:30
Brettski
Yeah, that is a good idea not to covert standard/acceptable characters. It never really mattered to us, but I can surely so how it could. Thanks for the add, much appreciated.
Thursday, July 17, 2014 at 6:06:56
Woj
Thanks a lot for posting these two functions – they made my work possible.
Thanks again.
Friday, February 27, 2015 at 7:07:06
Convert files to Unicode with VBA | In the Ever After
[…] https://blog.brettski.com/2009/12/04/vba-convert-unicode-to-ascii/ […]
Thursday, July 9, 2015 at 5:05:08
10basetom
Nice article. You can also convert Unicode to HTML entities by doing something like this:
“`
Function Unicode2Html(Text As String) As String
If Trim(Text) = “” Then
Exit Function
End If
Dim aDBCS() As Byte, _
sOut As String, _
i As Integer
‘Map the string to a Byte array
aDBCS = Text
‘Loop through each character
For i = LBound(aDBCS) To UBound(aDBCS) Step 2
‘Extract the hex value and convert to decimal
sOut = sOut & “&#” & CLng(“&H” & Hex(aDBCS(i + 1)) & Hex(aDBCS(i))) & “;”
Next
Unicode2Html = sOut
End Function
“`