2012年3月29日星期四

BCP/BULK INSERT

Hi everyone,
I have to load data from a .txt file into a database table, I've decided to
use the BULK INSERT command because of the speed it has. At the first phase,
all data from the text file inserted to a temporaly table, which has only
varchar(x) fields, the second phase will process the data.
I have problems with the first phase, some records of the text file are not
well-formed, some fields are missing in several rows (this by design,
unfortunatly).
If the last field is missing, it will be null, as I excepted it.
But, if the the last two (or more) fields are missing, the it seems the
whole line shifted, and BCP starts to read the next row. And, of course, it
produces an error ("String or binary data would be truncated"). If I turn off
the ANSI_WARNINGS, I'll have a lot of false data - because of the "shift".
I'm using format files, all the fields are SQLCHAR by default, all of them
has a correct field length.
You could reproduce the error of course, with the following test script:
if exists(select 1 from sysobjects where name='table1')
begin
drop table table1;
end;
create table table1(
field1 varchar(2),
field2 varchar(2),
field3 varchar(2)
);
go
truncate table table1;
bulk insert table1 from 'table1.txt' with (formatfile='table1.fmt');
select * from table1;
go
The format file:
8.0
3
1 SQLCHAR 0 2 "" 1 field1
Hungarian_CI_AS
2 SQLCHAR 0 2 "" 2 field2
Hungarian_CI_AS
3 SQLCHAR 0 2 "\r\n" 3 field3
Hungarian_CI_AS
The data file:
01AAaa
02BBbb
03CCcc
04DD
05EE
06
07
08HH
09IIii
Has anyone a help or suggestion to resolve this problem? I do not want to
hardcode this process :).
Thanks,
Tamas BeriHi
Modify it for your needs
1)
select * from OpenRowset('MSDASQL', 'Driver={Microsoft Text Driver (*.txt;
*.csv)};
DefaultDir=D:\myfolder;','select * from data1.txt')
--Text file structure
col1
01AAaa
02BBbb
03CCcc
04DD
05EE
06
07
08HH
09IIii
2)
CREATE TABLE [tt] (
[ModuleID] [int] IDENTITY (1, 1) NOT NULL ,
[field1] [varchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
,
[field2] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL
) ON [PRIMARY]
GO
BULK INSERT tt
FROM 'd:\dat1.txt'
WITH
(
FIRSTROW = 3,
FORMATFILE = 'd:\fmt1.fmt'
)
select * from tt
--Text file structure
field1,field2
01,AAaa
02,BBbb
03,CCcc
04,DD
05,EE
06,
07,
08,HH
09,IIii
--fmt file structure
8.0
2
1 SQLCHAR 0 100 "," 2 field1
SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 10 "\r\n" 3 field2
SQL_Latin1_General_CP1_CI_AS
"gfoyle" <gfoyle@.discussions.microsoft.com> wrote in message
news:2E818698-6ABB-4820-BB41-57BA89477D11@.microsoft.com...
> Hi everyone,
> I have to load data from a .txt file into a database table, I've decided
to
> use the BULK INSERT command because of the speed it has. At the first
phase,
> all data from the text file inserted to a temporaly table, which has only
> varchar(x) fields, the second phase will process the data.
> I have problems with the first phase, some records of the text file are
not
> well-formed, some fields are missing in several rows (this by design,
> unfortunatly).
> If the last field is missing, it will be null, as I excepted it.
> But, if the the last two (or more) fields are missing, the it seems the
> whole line shifted, and BCP starts to read the next row. And, of course,
it
> produces an error ("String or binary data would be truncated"). If I turn
off
> the ANSI_WARNINGS, I'll have a lot of false data - because of the "shift".
> I'm using format files, all the fields are SQLCHAR by default, all of
them
> has a correct field length.
> You could reproduce the error of course, with the following test script:
> if exists(select 1 from sysobjects where name='table1')
> begin
> drop table table1;
> end;
> create table table1(
> field1 varchar(2),
> field2 varchar(2),
> field3 varchar(2)
> );
> go
> truncate table table1;
> bulk insert table1 from 'table1.txt' with (formatfile='table1.fmt');
> select * from table1;
> go
> The format file:
> 8.0
> 3
> 1 SQLCHAR 0 2 "" 1
field1
> Hungarian_CI_AS
> 2 SQLCHAR 0 2 "" 2
field2
> Hungarian_CI_AS
> 3 SQLCHAR 0 2 "\r\n" 3
field3
> Hungarian_CI_AS
> The data file:
> 01AAaa
> 02BBbb
> 03CCcc
> 04DD
> 05EE
> 06
> 07
> 08HH
> 09IIii
> Has anyone a help or suggestion to resolve this problem? I do not want to
> hardcode this process :).
> Thanks,
> Tamas Beri
>|||Thanks,
finally I've decided to read the data in two steps, at first in a temp
table which has only one row, and then the second phase is an insert into
select from with a massive using of substring, cast and case :).
Another strange thing, I've tried to create a procedure:
create procedure some_procedure(@.filename varchar(256)) as
begin
bulk insert some_table from @.filename with(codepage='raw');
end;
go
And the creation fails, it says, "Incorrect syntax near '@.filename'.".
?
It is possible to pass the bulk insert command a variable?
Regards,
Tamas Beri
"Uri Dimant" wrote:
> Hi
> Modify it for your needs
> 1)
> select * from OpenRowset('MSDASQL', 'Driver={Microsoft Text Driver (*.txt;
> *.csv)};
> DefaultDir=D:\myfolder;','select * from data1.txt')
> --Text file structure
> col1
> 01AAaa
> 02BBbb
> 03CCcc
> 04DD
> 05EE
> 06
> 07
> 08HH
> 09IIii
> 2)
> CREATE TABLE [tt] (
> [ModuleID] [int] IDENTITY (1, 1) NOT NULL ,
> [field1] [varchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
> ,
> [field2] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
> NULL
> ) ON [PRIMARY]
> GO
>
> BULK INSERT tt
> FROM 'd:\dat1.txt'
> WITH
> (
> FIRSTROW = 3,
> FORMATFILE = 'd:\fmt1.fmt'
> )
> select * from tt
> --Text file structure
> field1,field2
> 01,AAaa
> 02,BBbb
> 03,CCcc
> 04,DD
> 05,EE
> 06,
> 07,
> 08,HH
> 09,IIii
> --fmt file structure
> 8.0
> 2
> 1 SQLCHAR 0 100 "," 2 field1
> SQL_Latin1_General_CP1_CI_AS
> 2 SQLCHAR 0 10 "\r\n" 3 field2
> SQL_Latin1_General_CP1_CI_AS

没有评论:

发表评论