2012年3月19日星期一

bcp or oledb/ado?

I have very big tables. And I need to some computation for each row. Which
one is the best/fast way to implement it?
1. bcp the tables to text file. C++ code parse the csv file row by row.
Write results to text files. Then bulk insert back to Sql server.
2. C++ code use oledb/ado to get the rows, write results to text file and
bulk insert back.
3. C++ code use oledb/ado for both getting and writting back operations.What type of computation do you need to do?
Why not just do it within SQL using T-SQL?
create table #foo (col1 int, col2 int, col3 decimal(5,2))
insert into #foo (col1, col2, col3) values (1,2,3)
insert into #foo (col1, col2, col3) values (10,20,30)
insert into #foo (col1, col2, col3) values (5,6, NULL)
go
select * from #foo
update #foo set col3 = col1 * col2
select * from #foo
select * from #foo
update #foo set col3 = col1 * col2
select * from #foo
update #foo set col3 = col3 / 4
select * from #foo
update #foo set col3 = (col3 / 2) / 1
select * from #foo
go
drop table #foo
If your tables are huge and you don't have enough disk space you might fill
up the transaction log. If that is a concern you could update batches of
data (based on your primary key).
Keith Kratochvil
"nick" <nick@.discussions.microsoft.com> wrote in message
news:89811B8B-D867-4163-A4F9-132EB4BC6D69@.microsoft.com...
>I have very big tables. And I need to some computation for each row. Which
> one is the best/fast way to implement it?
> 1. bcp the tables to text file. C++ code parse the csv file row by row.
> Write results to text files. Then bulk insert back to Sql server.
> 2. C++ code use oledb/ado to get the rows, write results to text file and
> bulk insert back.
> 3. C++ code use oledb/ado for both getting and writting back operations.
>|||The computation is complex math, matrix mutilple, etc. Very difficult to
write in TSQL. And the code is provided from other group and I cannot modify
it too. I will definitely rewrite it in TSQL if it's possbile.
"Keith Kratochvil" wrote:

> What type of computation do you need to do?
> Why not just do it within SQL using T-SQL?
> create table #foo (col1 int, col2 int, col3 decimal(5,2))
> insert into #foo (col1, col2, col3) values (1,2,3)
> insert into #foo (col1, col2, col3) values (10,20,30)
> insert into #foo (col1, col2, col3) values (5,6, NULL)
> go
> select * from #foo
> update #foo set col3 = col1 * col2
> select * from #foo
> select * from #foo
> update #foo set col3 = col1 * col2
> select * from #foo
>
> update #foo set col3 = col3 / 4
> select * from #foo
> update #foo set col3 = (col3 / 2) / 1
> select * from #foo
> go
> drop table #foo
>
> If your tables are huge and you don't have enough disk space you might fil
l
> up the transaction log. If that is a concern you could update batches of
> data (based on your primary key).
>
> --
> Keith Kratochvil
>
> "nick" <nick@.discussions.microsoft.com> wrote in message
> news:89811B8B-D867-4163-A4F9-132EB4BC6D69@.microsoft.com...
>
>|||you will have to define what "very big tables" means, but if they are over 1
0
million rows, then exporting them to a text file and running a well written
C
object against it will probably be faster. You should make sure you make
only one pass through the data making your computations, then use sql's bulk
insert to get the finished product back into the db.
"nick" wrote:

> I have very big tables. And I need to some computation for each row. Which
> one is the best/fast way to implement it?
> 1. bcp the tables to text file. C++ code parse the csv file row by row.
> Write results to text files. Then bulk insert back to Sql server.
> 2. C++ code use oledb/ado to get the rows, write results to text file and
> bulk insert back.
> 3. C++ code use oledb/ado for both getting and writting back operations.
>|||the big tables vary from several thousand rows to several tens million rows.
bcp should be faster then client side cursor, however it involved more I/O i
n
the whole process...
"Carl Henthorn" wrote:
> you will have to define what "very big tables" means, but if they are over
10
> million rows, then exporting them to a text file and running a well writte
n C
> object against it will probably be faster. You should make sure you make
> only one pass through the data making your computations, then use sql's bu
lk
> insert to get the finished product back into the db.
> "nick" wrote:
>|||nick (nick@.discussions.microsoft.com) writes:
> I have very big tables. And I need to some computation for each row. Which
> one is the best/fast way to implement it?
> 1. bcp the tables to text file. C++ code parse the csv file row by row.
> Write results to text files. Then bulk insert back to Sql server.
> 2. C++ code use oledb/ado to get the rows, write results to text file and
> bulk insert back.
> 3. C++ code use oledb/ado for both getting and writting back operations.
Now, C++ programming is not my main business, but my gut feelings says
that 1 is not a good solution. It takes time to read a file as well.
So to get the data into the client program, I would get one huge rowset,
or possibly batchwise. Note: not a server-side cursor, but client side.
For writing data back, I would use either bulk copy or send down an
XML document that I unpack in SQL Server with OPENXML. I would proably
not insert into the target table - I would prefer to update it. (Unless
the computations also removes and add rows.) To this OPENXML is maybe a
little simpler, but you can easily bulk copy into a staging table
you update from.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Yes, bcp seems involve more I/O. But will it still be faster then client-sid
e
cursor? And I am trying to avoid client side oledb programming, ado is said
easier but slower. And I cannot use ado.net since it's not clr program.
"Erland Sommarskog" wrote:

> nick (nick@.discussions.microsoft.com) writes:
> Now, C++ programming is not my main business, but my gut feelings says
> that 1 is not a good solution. It takes time to read a file as well.
> So to get the data into the client program, I would get one huge rowset,
> or possibly batchwise. Note: not a server-side cursor, but client side.
> For writing data back, I would use either bulk copy or send down an
> XML document that I unpack in SQL Server with OPENXML. I would proably
> not insert into the target table - I would prefer to update it. (Unless
> the computations also removes and add rows.) To this OPENXML is maybe a
> little simpler, but you can easily bulk copy into a staging table
> you update from.
>
> --
> Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/pr...oads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodin...ions/books.mspx
>|||nick (nick@.discussions.microsoft.com) writes:
> Yes, bcp seems involve more I/O. But will it still be faster then
> client-side cursor? And I am trying to avoid client side oledb
> programming, ado is said easier but slower. And I cannot use ado.net
> since it's not clr program.
Admittedly, there is some overhead in a recordset/rowset. Maybe bulk
out to file, and then read the file into memory in one swoop?
The only way to find out is to benchmark - if you have the time.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

没有评论:

发表评论