-=记录我与java的点滴=-

友情博客

搜索

最新评论

RSS

我的 Blog:
walking 最新的 20 条日志
[工]
[忆]
[品]
[曲]
全站 Blog:
全站最新的 20 条日志

Saving an well form xml as UTF-8 encoding format using VB6

 

 

 

In my last project, I have to read some text from WinWord doc file, and then I have to save them as a well form xml file for a java application to read. For these doc files are from global business departments, contain English, French, German, Chinese, etc, we have to save the xml file as UTF-8.

I will not explain how I read WinWord files here. I just want to talk about what happened when I try to save the xml as UTF-8 encoding. That confused me for a long time.

First, as a simple step, we save the xml using UTF-8 encoding. The code below shows how this works.

Private Sub ToUtf8(ByVal s As String, ByVal FilePath As String)

    Dim stmStr As ADODB.Stream

    Set stmStr = CreateObject("ADODB.Stream")

    stmStr.Open

    stmStr.Charset = "utf-8"

    stmStr.WriteText s

    stmStr.SaveToFile FilePath, adSaveCreateOverWrite

    stmStr.Close

    Set stmStr = Nothing

End Sub

Now we can save an xml as UTF-8 encoding. When open this xml file with IE, it looks good. But unfortunately java parser throws exception when we try parsing it. Parser told me that the xml file is not well form.

What happened?

Let抯 see what happened when we call ToUtf8.

When we call ToUtf8 to write the encoded string to the xml file, it put a 3 bytes before the strings. Their hex codes are EF, BB and BF. But in java world these will not be recognized. That is why the bug appears.

How can we solve it?

My solution is a stupid way : cut those three bytes.

Private Sub cut_utf8(file_name As String)

        Dim tempFile As Long

        Dim TempFile1 As Long

        Dim LoadBytes() As Byte

        Dim OutBytes() As Byte

       

        tempFile = FreeFile

       

        Open file_name For Binary As #tempFile

        ReDim LoadBytes(1 To LOF(tempFile)) As Byte

       

        Get #tempFile, , LoadBytes

       

       

        'On Error GoTo NoEncoding

        Dim i As Integer, FileCharset As String, strFileHead As String, str As String

        For i = 1 To 3

            strFileHead = strFileHead & Hex(LoadBytes(i))

        Next

        'Debug.Print strFileHead

        If strFileHead = "EFBBBF" Then

           

            'FileCharset = "UTF-8"

            new_len = (LOF(tempFile) - 3)

           

            ReDim OutBytes(1 To new_len) As Byte

           

            For i = 1 To new_len

                OutBytes(i) = (LoadBytes(i + 3))

            Next

            Close #tempFile

           

            Dim fs_t As New Scripting.FileSystemObject

            Dim fi_t As file

            Set fi_t = fs_t.GetFile(file_name)

            fi_t.Delete (True)

            Set fi_t = Nothing

           

            TempFile1 = FreeFile

            Open file_name For Binary As #TempFile1

            Put #TempFile1, , OutBytes

            Close #TempFile1

        Else

            Close #tempFile

        End If

End Sub

作者:Rick 发表时间:2005-10-27  [所属栏目:] | [返回首页]
日志-1  每页显示-1 

评论(共 条) 我要评论
{CommentTime} | {CommentAuthor} {CommentUrl}
{CommentContent}