2012年2月25日星期六

BCP character conversion

I bcped in data into a database with SQL_Latin1_General_Cp1_CI_AS collation. The input data has an embedded character (ascii 174). I did not specify any code page using the -C parm. The data was converted to character (ascii 171). I ran the bcp trying -C1252 and -CRAW and both maintained the correct character. -C437 and -COEM change the character to .
Why did this happen? I thought that data would be converted to correctly without any code page specification.Different code pages map binary values to glyphs (the graphic symbols that humans know and love) differently. One binary value can map to many different glyphs using different code pages.

If BPC doesn't know which code page to use for translation, you get "pot luck", especially for characters that aren't well defined. Typically, you want the code page that created the data. Occaisionally, you want the code page that was intended (or at least used) to view the data. Because of the pot-pouri of mappings supported by the different code pages, the business of getting data from point A to point B has grown yet another potentially "interesting" twist to amuse those of us that do the moving!

-PatP|||Thanks for the quick reply.
BOL states:
"When bulk copying data using native or character format, bcp, by default, converts character data to:

OEM code page characters when exporting data from an instance of Microsoft SQL Server.

ANSI/Microsoft Windows code page characters when importing data into an instance of SQL Server. "

So wouldn't the bcp in use code page 1252 by default. This should be similar to -C1252.|||Well where did the data come from?|||The bcp ran from my workstation with a code page 437 - if that's your question.|||From Books Online topic bcp: "OEM Default code page used by the client. This is the default code page used by bcp if -C is not specified."
From that I suppose that SQL Server interpreted your file as being OEM 437 CP. mojza|||I guess I still don't understand why the character was changed during the bcp. I can view it correctly from my workstation which is 437, but if I bcp using -C437 or without -C(which uses default OEM code page) it gets converted. I think I'm missing something.|||In what editor can you see that character correctly? in ANSI (e.g.Notepad) or in OEM (e.g.Edit)? mojza|||Correctly in notepad or textpad, not correctly in edit. So bcp, running in a command window, is using 437 which changes the character to ?|||Then, in my opinion, your file was created in code page ANSI 1252 (notepad ok) and bcp interprets your file as cp 437 (default client OEM code page). That leads to a loss of some extended characters that are not compatible between these two pages unless you tell sql server to interpret him as 1252 or without any translation (RAW). Check out this Microsoft article. There is a good explanation and excellent examples. mojza

http://support.microsoft.com/default.aspx?scid=kb;en-us;199819|||Thanks for your help. That article definitely helped explain things. I also looked at the nls files for 437 and 1252 and character 174(offset x0178) reflects in the 1252 file and int 437 file.
Again, thanks.

没有评论:

发表评论