There are scenarios when the source file gets
generated with different row delimiters from different systems and results into
error when reading within the package. Handle dynamically row delimiter
issue we can use this will generic and permanent fix, so suppose if in future
we are changing row delimiter no need to change any package and re-test but
this can be managed only at time initial design time. Once package development done
with any one row delimiter (CRLF, LF, CR) we need to use different approach to
provide robust solution to avoid rework on each package. There are 3 approaches
to manage this issue:
Approach ‘A’
Use BULK
INSERT TASK instead of dataflow and flat file source. In
the BULK INSERT TASK, you can set ROW TERMINATOR and FIELD TERMINATOR
expression properties, and this will work correctly.
Approach ‘B’
We do some pretty intense
transformation of the data, so BULK INSERT was not an option. So we need
to some different approach. First we need to identifying the delimiter and then
reading the file within the respective Data Flow Tasks. Let’s see with an
example:
Example: A file gets
generated with a variable row delimiter which sometimes is “LF” and sometimes
is “CR”. The column delimiter of this file is always fixed i.e. “TAB”. So, to
read this file successfully with both the delimiters we need to follow the
below steps:
Step 1: – Create a
variable “varRowDelimiter” to store the delimiter value and a variable
“varSourceFile” which contains the path of the file.
Step 2: – Create a Script task which will read the
source file and identifies the row delimiter used in the file. The task will
use both of the above declared variables: ‘varSourceFile’ as ReadOnly
variable and ‘varRowDelimiter’ as ReadWrite variable
.
Inside the Editor we will write the code to
read the source file, identify the delimiter and store its value in the
variable. One thing which one should consider while writing this code is: how
huge your source files are going to be? As reading file will consume
memory, the files with huge sizes will affect the performance and may result
into errors such as Out of Memory. Mostly, I have seen that developers simply
use “ReadtoEnd” property even if the purpose of reading the file is not related
to reading the complete data. ReadtoEnd reads the complete file till end and
stores the entire data into memory thus resulting into high memory usage and a
bit of slow performance, thus it should be used only when there is a need to
read all the input to the end of the stream.
Here, I will be using the simple read method in
a loop in which I will read some characters at a time and as soon as I find the
delimiter, I will exit the loop.You can also read the files in fixed block
sizes using Roadblock method and can exit the loop when you have the delimiter
value.
Step 3 – Now,
create two flat file connection managers – one with the “LF” Row Delimiter and
one with the “CR” Row Delimiter and incorporate them into the respective Data
flow tasks.
Step 4: Create two Precedence constraints and connect them from
the Script Task to the Data flow tasks with the respective Expressions.
Now, just run the package for source file with
both of the delimiters and use the results as required.
The above example can be modified to check for
other delimiters as well or to cater other scenarios which requires the
delimiters identification at run time.
Approach ‘C’
This is agile approach and very useful, when we
have already delivered flat file reading solution with one delimiter and in
situation where we are can receive flat file with multiple row delimiters.
We need to create a new package will be
executed before the set of packages to load flat file in database (Destination).
This package will identify row delimiter of flat file and will replace with the
row delimiter we are using default in packages. For example if packages
developed with {CR}{LF} but files received are having {LF} then we need to
replace {LF} to {CR}{LF}.
Step 1: – Create a
variable “varReplaceRowDelimiter” to replace the delimiter value, variable “varFindRowDelimiter”
to find the existing delimiter value and a variable “varSourceFile”
which contains the path of the file.
Step 2: – Create a Script task which will read the
source file and identifies the row delimiter used in the file. The task will
use all of the above declared variables: ‘varSourceFile’, “varReplaceRowDelimiter”
, “varFindRowDelimiter” as ReadOnly
variable.
Inside the Editor we will write the code to
read the source file, identify the delimiter store its value in the variable
and replace with delimiter store its value in the variable. See the below code:
'This method is called when this script task executes in
the control flow.
'Before returning from this method, set the value of
Dts.TaskResult to indicate success or failure.
'To open Help, press F1.
Public Sub Main()
'
' Add your code here
'
Dim pk As New Package
Dim file As New System.IO.StreamReader("C:\users\ABC\Downloads\test.txt")
Dim data As String
data
= file.ReadToEnd()
data
= data.Replace( vbLf , vbCrLf)
file.Close()
Dim writer As New System.IO.StreamWriter("C:\Users\ABC\Downloads\test.txt", False)
writer.Write(data)
writer.Flush()
writer.Close()
Dts.TaskResult = ScriptResults.Success
End Sub
Use variables as below:
"C:\users\ABC\Downloads\test.txt") = Dts.Variables("@[User::varSourceFile]").Value.ToString()
vbLf = Dts.Variables("@[User::varFindRowDelimiter]").Value.ToString()
vbCrLf = Dts.Variables("@[User::varReplaceRowDelimiter]").Value.ToString()
One thing which one should consider while
writing this code is: how huge your source files are going to be? As
reading file will consume memory, the files with huge sizes will affect the
performance and may result into errors such as Out of Memory. We need to handle
buffer location through config file.
Hi Krishna,
ReplyDeleteThanks for your Information.
I have a scenario i.e., in my source folder having some 10-15 no of flat files.
some are having column delimiter is "tab" and some are having column delimiter is "Comma".
in this scenario How to handle column delimiter
Please help me,
Thanks in advance
Hi Krishna,
ReplyDeleteI am not very comfortable in writting code to identify delimeters of a flat file inside script task, could you please share some inputs on this ?
Need your Help !!!
Thanks in advance.
Hi Krishna,
ReplyDeleteThanks for your Information.
I have a scenario i.e., in my source folder having some 10-15 no of flat files.
some are having column delimiter is "tab" and some are having column delimiter is "Comma".
in this scenario How to handle column delimiter
Please help me,
Thanks in advance
Hi Krishna,
ReplyDeleteThanks for your Information.
I have a scenario i.e., in my source folder having some 10-15 no of flat files.
some are having column delimiter is "tab" and some are having column delimiter is "Comma".
in this scenario How to handle column delimiter
Please help me,
Thanks in advance