简体   繁体   English

从 Azure Dala Lake gen2 复制到 Azure Synapse 什么都不做

[英]COPY INTO from Azure Dala lake gen2 to Azure Synapse does nothing

I am trying to copy from azure data lake gen2 to table in azure synapse warehouse using local ssms.我正在尝试使用本地 ssms 从 azure 数据湖 gen2 复制到 azure 突触仓库中的表。 The COPY INTO statement is neither throwing any errors and nor dumping the data. COPY INTO语句既不会引发任何错误,也不会转储数据。 I am copying the pandas df from centos server to azure data lake gen2 using sep=',', encoding='utf-8' .我正在使用sep=',', encoding='utf-8'将 pandas df 从 centos 服务器复制到 azure 数据湖 gen2。 Here is the COPY statement that I am using.这是我正在使用的 COPY 语句。

COPY INTO dbo.SALES_CUTOMER_D 
FROM 'https://acoount_name/test-file-system/SALES_CUSTOMER_D_0.csv'
WITH (
 FILE_TYPE = 'csv',
 CREDENTIAL=(IDENTITY= 'Storage Account Key', SECRET=''),
 FIELDQUOTE = '"',
 FIELDTERMINATOR=',',
 ROWTERMINATOR='\r\n',
 ENCODING = 'UTF8',
 FIRSTROW = 2
)

Check if your file has Unix-style line endings (LF) instead of Windows-style (CRLF).检查您的文件是否具有 Unix 样式的行尾 (LF) 而不是 Windows 样式 (CRLF)。

See Difference between CR LF, LF and CR line break types?请参阅CR LF、LF 和 CR 换行符类型之间的区别? if you're not clear on CRLF.如果您不清楚 CRLF。

Easiest way I know of checking is to open file in vi in binary mode with set list :我知道检查的最简单方法是使用set list以二进制模式在 vi 中打开文件:

vi -b -c 'set list' <file>

To verify if this is the problem or not, you can do one of the following:要验证这是否是问题所在,您可以执行以下操作之一:

  1. Tell COPY what line endings are in your file:告诉 COPY 文件中的行尾是什么:

     COPY INTO dbo.SALES_CUTOMER_D FROM 'https://acoount_name/test-file-system/SALES_CUSTOMER_D_0.csv' WITH ( FILE_TYPE = 'csv', CREDENTIAL=(IDENTITY= 'Storage Account Key', SECRET=''), ROWTERMINATOR='0x0A', FIRSTROW = 2 )
  2. Confirm that it's actually reading the file by making it parse the header.通过解析 header 来确认它实际上正在读取文件。 Remove FIRSTROW = 2 .删除FIRSTROW = 2

  3. Change line-endings:更改行尾:

  • unix2dos <csv file>
  • upload to datalake and try COPY again, without ROWTERMINATOR='\r\n', (that's the default value).上传到 datalake 并再次尝试COPY ,没有ROWTERMINATOR='\r\n', (这是默认值)。

A little gotcha : 一个小问题

COPY treats '\n' as '\r\n' internally. COPY 在内部将 '\n' 视为 '\r\n'。 For more information, see the ROWTERMINATOR section.有关详细信息,请参阅 ROWTERMINATOR 部分。

In other words:换句话说:

  • If we don't specify ROWTERMINATOR option or specify ROWTERMINATOR='\n' or ROWTERMINATOR='0x0D0A' , then the engine uses \r\n as terminator (Windows style).如果我们不指定ROWTERMINATOR选项或指定ROWTERMINATOR='\n'ROWTERMINATOR='0x0D0A' ,则引擎使用\r\n作为终止符(Windows 样式)。
  • If we specify ROWTERMINATOR='0x0A' then engine uses '\n' as the terminator (Unix style)如果我们指定ROWTERMINATOR='0x0A'然后引擎使用 '\n' 作为终止符(Unix 风格)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Azure Data Lake Gen2 中对文件进行行计数 - How to do line count of a file in Azure Data Lake Gen2 Azure Data Lake Storage Gen2 (ADLS Gen2) 作为 Kedro 管道的数据源 - Azure Data Lake Storage Gen2 (ADLS Gen2) as a data source for Kedro pipeline 如何使用 Azure Synapse 和 pySpark 笔记本从 ADLS gen2 检索 .dcm 图像文件? - How to retrieve .dcm image files from the ADLS gen2 using Azure Synapse and pySpark notebook? 如何将 .parquet 文件从本地计算机上传到 Azure Storage Data Lake Gen2? - How can I upload a .parquet file from my local machine to Azure Storage Data Lake Gen2? 如何使用 Python 从 Azure Data Lake Storage Gen2 检索所有目录路径? - How do I retrieve all directory paths from Azure Data Lake Storage Gen2 using Python? 从 Azure Databricks 中的 Azure Datalake Gen2 读取 .nc 文件 - Read .nc files from Azure Datalake Gen2 in Azure Databricks 无法使用 python azure-storage-file-datalake SDK 在 Azure Data Lake Gen2 中创建 Append Blob - Cannot create Append Blobs in Azure Data Lake Gen2 using python azure-storage-file-datalake SDK 使用 Azure CLI、Rest API 或 Python 在 Azure ADLS gen2 中复制文件 - Copy files within Azure ADLS gen2 using Azure CLI, Rest API or Python Azure Function Python 写入 Azure DataLake Gen2 - Azure Function Python write to Azure DataLake Gen2 使用Python或Java从本地将数据上传到Azure ADLS Gen2 - Upload data to the Azure ADLS Gen2 from on-premise using Python or Java
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM