欢迎投稿

今日深度:

SSIS Performance

SSIS Performance


Parallelism exists almost in every field after multi-core processor come into play, and SSIS is not an exception. SSIS allow us configuration the parallelism in two different granularities:

Packge Level

By set the MaxConcurrentExecutables property within the package, we indicate SSIS engine how many Executables can run simultaneously. The default value is -1 which means the number of processor plus 2.

Now let's do a very simple pratice. I create three Data Flow Tasks in the package and Set the MaxConcurrentExecutables property to 2 which means just 2 executables are allowed to run simultaneously. Then I set breadpoint on all of them:

 

Then let's run the package, you will find only two tasks are running now, the third one need to wait until one of them finish:

Then let's set the MaxConcurrentExecutables to 3 and execute the package again, we can see the three tasks are running simultaneously:

Data Flow Level

Now we have 3 executables(Data Flow tasks) in the package and all of them will run simultaneously after we set MaxConcurrentExecutables = 3. Then let's get into the Data Flow task, the EngineThreads property within the Data Flow indicate the number of threads that data flow task can use during execution

It is a little obscure when we see the definition at the first glance. So let me make a simple explanation about the background. In general Data Flow task is the only place where SSIS do E-T-L(you may say we ca do this using Execute SQL Task, but in that case it is the SQL Server engine doing the ETL and SSIS just make a call), and in the simplest scenario, if Data Flow just extract data from source and then load the data into destination, we need one buffer and two threads: one is the used to extract data from source named Source Thread, another one is used for transformation/destination named Worker Thread.

But that's only the simplest scenario, in most cases the Data Flow will do some transformations(Like Union, Lookup, Derived Column etc.) and so need more threads. SSIS use the concept Execution Tree for this: one Execution Tree means SSIS must create a buffer and need a thread.

Now I create 4 Source -> Destination in every Data Flows task which means there are 4 execution trees for every Data Flow task, and also it means SSIS need 4 worker threads if we want all of them run simultaneously.

If we set EngineThreads = 2, then only two of those Source->Destination can run simultaneously(When I do pratice base on SQL Server 2012, I found all of those 4 run simultaneously, I am still wondering why..... and will update this once I find the answer.).

www.htsjk.Com true http://www.htsjk.com/shujukunews/807.html NewsArticle SSIS Performance Parallelism exists almost in every field after multi-core processor come into play, and SSIS is not an exception. SSIS allow us configuration the parallelism in two different granularities: Packge Level By set the MaxConcur...
相关文章
评论暂时关闭