Loading JSON files with an on-premise SQL Server is pretty simple. You point to the local path file and load. However, it's more involved with Azure SQL and Azure Data Lake. Let's run through the steps to directly load an Azure Data Lake file into an Azure SQL Database.
Create A Database Master Key
We will need to create a strong, complex password to use to create the master key. If you don't feel all that creative, you can use a website (https://passwordsgenerator.net/) that generates random strong passwords.
Connect to your Azure SQL Server Database using SSMS or Azure Data Studio. Use the following command after editing your complex password.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<Complex Password>'
Create Database Scoped Credential
We will need to create a scoped credential that allows SQL Server to Authenticate with an Azure Data Lake. In addition, we will need to obtain a Shared Access Key from Azure Data Lake.
Navigate to https://portal.azure.com and select your storage account and the container you wish to access. Then under 'Settings', select 'Shared Access Tokens', and at the bottom of the screen, click 'Generate SAS token and URL. IMPORTANT NOTE: be sure and set the expiration date well in advance so the token does not expire.
Next, copy the Blob SAS token by using the copy icon on the right, paste it into the script below, and name your credential to a recognizable name. Then execute the script.
CREATE DATABASE SCOPED CREDENTIAL <AzureDataLakeCredentalName>
WITH
IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = '<Blob SAS token>';
Create External Data Source
Next, we need to create an external data source connection to the Azure Data Lake container using the credentials we just created.
Select a recognizable name for the data source, then put in the URL to your Azure Data Lake and container. Next, set in the credential name and then execute the script.
CREATE EXTERNAL DATA SOURCE <data source name>
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://<ADL>.blob.core.windows.net/<container>',
CREDENTIAL = <AzureDataLakeCredentalName>);
Selecting From JSON File
Now that the external data source is created, we can select from a file on Azure Data Lake. First, replace the filename with the file you wish to open and then adjust the data source name to the above created name.
SELECT
*
FROM
OPENROWSET(BULK '<filename>', SINGLE_CLOB, DATA_SOURCE = '<data source name>') AS j;
Dynamic Loading
Using a variable for the file name doesn't work, so to dynamically pass a file name for loading, we must create a dynamic script to execute.
Below is an example of a stored procedure that dynamically uses a file name passed to load the JSON file into a table.
CREATE PROCEDURE dbo.LoadJSONFile
(
@FileName VARCHAR(200)
)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @SQLString NVARCHAR(MAX);
SET @SQLString = N'
INSERT INTO stg.JSONFiles (FileName, JSONData)
SELECT
''' + @FileName + N''', *
FROM
OPENROWSET(BULK ''' + @FileName + N''',
SINGLE_CLOB, DATA_SOURCE = ''datasourcename'')
WHERE
NOT EXISTS (SELECT
1
FROM
stg.JSONFiles
WHERE
FileName = ''' + @FileName + N'''';
EXECUTE sys.sp_executesql
@SQLString;
END;
Comments