2012年2月23日星期四

BCP and table space allocation quandary

Hey all, first time poster here.
Cheers!

Enterprise Edition SQL2k sp3a on Win2k3 Enterprise Edition server.

Using BCP to load ~4 million records.

Largest record width is ~350 bytes.

When finished i have over 6 million pages reserved and only about 800 thousand actually used.

Which puts me 'round about 3 rows per page and WAY too much empty space.

YIKES!

Any thoughts as to why this is happening?

Or insight into how BCP handles page allocations that may be different than the typical extent based page allocations for table data?

Thanks!It's not bcp...it's how the database is set up with free space...

Sounds like the percent is high, which is allowing for a large number of inserts...

If it's low, then the inserts would occur on new pages...and more maint is required...

But it does sounds way too high...|||Are there any indexes on the table you imported into? If there is a clustered index, you may have just seen a large amount of page-splitting. Also, you may want to run dbcc updateusage (0) on the database, to make sure you are getting the most up to date information from these numbers. They are not updated automatically very well.|||Hmmm...well there is a clustered index on the table with the default fill factor (0 = 100%, right?)

I'm truncating the table between runs, so i'm not sure i follow why it would be splitting pages - as there arent any in existence to split prior to running the bulk load.

Do you have a moment to elaborate a bit for my edification and education?|||I'd be curious...

Go into EM and select all tables, then go to the option tab and select indexes...

Just thos, and script it out.

The indexes will all fall to the bottom...take a look at what FILL_FACTOR and PAD_INDEX are set to...

I usually never touch them...and relying on weekly maintenance...

BUT if your transaction volume is high, I'd imagine you'd want to...

How big is the database overall?|||the database is ~7gb in total
this one table, when loaded, (with this allocation issue) causes the database to jump to over 50gb.
this table should only be a couple gigs by itself.
it is completely empty between each of my test runs.

i take my ealier 'clustered' index comment back - its actually a non-clustered unique with the default fill and pad values.

interestingly enough - i just used a logged operation to load the table (insert into...select) using a linked server to get the data off the source server and the allocation was normal. ~800k reserved and ~800k used.

this must have something to do with the way BCP is working.

im still very confused.

:(|||Well yes...confused...

INSERT INTO...SELECT is not bcp

It IS also a HIGHLY logged operation

Is the other linked server SQL Server?

I would bcp out the table, truncate the existing (if need be) then bcp the data in..

Oh, and you're problem is your log if I'm not mistaken, not the datafile

I would dump the trans log before the operation, perform the bcps, then take a full backup, with regular tranny dumpsa afterwards|||ahhh...OK...i'll clarify a bit.

Using BCP to load this table SQL Server is allocating (sysindexes..reserved where indid = 0) 6 million pages to my table but only using (sysindex..dpages where indid = 0) 800 thousand. This is actual table data, not a log file growing. This is unrecoverable page allocation.

NOT supposed to happen.

The table is empty prior to using BCP to load and having the above allocations occur. The table has a unique non-clustered index with a 100% fillfactor - but there are no rows in the table.

When using a linked server and an INSERT INTO...SELECT statement instead of BCP the allocated and used pages are virutally identical. Both in the 800k range.

It's only when using BCP that the table allocates an additional ~50gb of EMPTY pages and ends up with a rows per page density of ~3.

I have no idea what is going on. I cant use the INSERT INTO...SELECT in production because the BCP file is coming from a far remote site that i have no access to.

Any thoughts?

:(|||Have you tried dropping all indexes and PK's before the BCP then re-creating them after the BCP? Or ordering your data file to match the columns and sorting of the pk and indexes?|||You know what...istead of us shooting blanks...

How about you post the ddl of the table, include the bcp code, and a small sample of the data...

bcp will not preallocate all those pages...at least I haven't seen it do that or read where it does.

Not saying it can't mind you, it's just doesn't make sense from a logical perspective...

And did you know that bcp actually is logged...|||good idea, thought about it, but have not tried it.
the nonclustered PK is the only index on the table and it is what keeps the data...well...unique. if i drop that index i have 2 issues:

1. loading dupes - not cool.
2. would need to drop all the FKs related to it in the process, and re-build all of them too. the load window will not allow a full rebuild of all the PKs and related FK constraints - these tables hold tens upon tens of millions of rows each and building the constraints in testing has shown that to take many hours (6+) - with an hourly load window everything needs to take less than an hour before the new files show up. UGH.

the INSERT..INTO test i ran loaded ~4.5 million in just a few minutes - so the assumption is that BCP would be much faster...but it takes ~30 minutes due to all this empty page allocation/splitting and the ensuing database file growth.

i cant understand why it would over allocate using BCP and not with a fully logged operation.

therein lies the crux of the issue.|||yeah...BCP is minimally logged...it logs disk allocations only, though, i believe.

i'll put that stuff together - thanks for asking!|||What, may I ask, happens to dups with a PK on the table?

The bcp should fail. No?

I know contraints are ignored unless a hint is specified.

And I know that a PK is a constraint.

But I've seen dup keys kick out and fail the bcp, where FKs are ignored...

Damn flashbacks...|||This is from BOL (CREATE INDEX):

User-specified FILLFACTOR values can be from 1 through 100. If no value is specified, the default is 0. When FILLFACTOR is set to 0, only the leaf pages are filled. You can change the default FILLFACTOR setting by executing sp_configure.

From this, the default fill factor may not be 100%. What is the server default fillfactor from sp_configure?|||default fillfactor is still stock meaning '0' or '100%'.

but i dont think that is a part of the equation, as using a fully logged insert avoids the over allocation issue entirely. (wish that were a possible solution)

fillfactors are only used when you create/rebuild an index.

from that point on SQL server endeavors to fill every page entirely.

in this instance, with a fully truncated table, even with a 100% fillfactor each index page is still completely empty and should (in theory) accept data until it's full.

regardless...the over allocation is to the data pages in this particular fiasco of an issue :) the non-clustered index page allocations are stored in seperate rows in sysindexes and they are all normal and not splitting.

i'm tracing the process now, and watching perfmon's extents allocated per second counter and it is spinning through the freakin' roof!!!!

I'll yank out the BULK INSERT and the table definition statement once this trace finishes.

Boy, this is a whopper.|||I've had similiar problems to the one you're describing and it's a pain in the rear but I always drop all PK's, FK's, and indexes before the BCP then re-create afterwards. oh yeah I always re-create with WITH CHECK to catch any dups.

Brett, remember the drop FK and truncate script, well I use a variation of that for this process.|||fillfactors are only used when you create/rebuild an index.

Hrm. I knew that, too. Hoist on my own petard. ;-)|||Hoist on my own petard.

LOL - i'm new here...not trying to ruffle any feathers or insult intelligence.

My most recent effort was done without the non-clustered PK and, alas, i'm in the same boat.

I wonder if McDonalds is hiring.|||Brett, remember the drop FK and truncate script, well I use a variation of that for this process.

Do I ever...Do you like the new version?

Also, doesn't bcp ingnore dups on load? And even though it said -h hint for constraints, it still violated them?|||not sure if bcp ignores dupes - will check though.
i use the ignore dupes setting on the unique index to avoid potential failures.
:)|||Bah! No ruffled feathers, here. That part of my brain that was supposed to remember that bit of information was apparently on break at the time I wrote that. We are really a fairly good natured group around here, and no harm is usually meant.|||Bah! No ruffled feathers, here. That part of my brain that was supposed to remember that bit of information was apparently on break at the time I wrote that.

You got that lose wire too...gotta get my tools and fix that one of these years...

We are really a fairly good natured group around here, and no harm is usually meant.

Have a drink?

Yak Corral Bar and GRill (http://www.dbforums.com/showthread.php?p=3697043#post3697043)|||Yeah, I like the new version. Works great.

I'm not sure about the hint. From what I've read the hint works but that doesn't mean it really does. Also, I keep confusing myself and I've looked in BOL. But does ignoring constraints include ignoring unique indexes?|||I don't know, and I don't care *

There's a right way and a wrong way as far as I'm concerned.

Some of these extensions are just "whack"

* line in a buffet song...and until I get a margarita, I won't remeber...|||FYI -

reinstalling SQL Server + sp3a made this issue go away.

dont care how, just happy its gone.

if i had an extra 3 days to sit on hold i'd bother to call M$ about it.

after trying same process on another system with no page allocation issues i finally decided on the 3rd 'R' of Windows debugging:

1. Retry
2. Reboot
3. Reinstall|||Wow. When I call Microsoft, I generally only hold about 40 seconds or so. Do you guys normally have to wait a long time to reach a tech?

-PatP|||Not with Premier Support.|||Support?

Whats that??

I'm lucky they pay for licenses around here.

Honestly, i was just epxressing my frustration with this issues overall.

A bit of hyperbolization.

没有评论:

发表评论