Scenario: CSV text file generated from C# to be imported into SQL Server database using BULK INSERT. Some fields contain special characters (aka Unicode).
Problem: The special characters display correctly in the text file, but are not being saved correctly in the database.
Edit: Example of correct text is "Khālid Muḥammad ʻAlī al-Ḥājj" and incorrect text is "Kha¯lid Muh?ammad ?Ali¯ al-H?a¯jj".
I pieced together the answer to this from several sources, so here is the whole thing in one place.
(1) The text file must be marked as a Unicode file.
StreamWriter writer = new StreamWriter(fileName, append: false, encoding: Encoding.Unicode);
(2) The database columns must use Unicode data types. For example, NVARCHAR instead of VARCHAR. Use one of the data types that starts with "N".
Side note: you can mark string literals in T-SQL as Unicode strings by prefixing them with an N.
INSERT INTO myTable (Name) VALUES (N'special characters');
(3) Specify DATAFILETYPE='widechar' in the BULK INSERT command. (If the file is not marked Unicode, this will throw an error.)
BULK INSERT dbo.myTable FROM 'C:\path\fileName.csv'
WITH (DATAFILETYPE ='widechar', FIRSTROW=2, FIELDTERMINATOR='|', ROWTERMINATOR='\n');
(4) Use database collation SQL_Latin1_General_CP1_CI_AS.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.