Sorry to disappoint you. In this post, we won't test Microsoft Fabric directly. We will compare Azure Data Factory (ADF) to Microsoft Fabric.
To quickly summarize where we are, we started testing Microsoft Fabric with a 10GB tar file, which took over 20 hours to extract, and now we have created a ticket with Microsoft.
Then we tested extracting the second level of files, 535,678 .gz files, which took 1 hour and 32 minutes.
I've been able to turn up the performance previously with Azure Data Factory, so I wanted to compare the same process we are testing in Microsoft Fabric but in Azure Data Factory.
I created a new integrated runtime and turned that up and the copy activity. I selected a custom Compute, Memory optimized, 256 cores (+ 16 Driver cores) for the runtime, and for the Copy Activity Settings, I selected 32 Maximum data integration units and 32 Degrees of copy parallelism.
Here are some screenshots of the settings:
Copy Activity Source:
Copy Activity Sink:
Copy Activity Settings:
Linked Service:
Integrated runtime:
First, I tested using Memory optimized, and the results were 15 hours, 49 minutes, and 46 seconds.
Just for comparison's sake, I tested using 'General purpose', and the results were 17 hours, 58 minutes, and 14 seconds. That is 2 hours and 15 minutes slower than 'Memory Optimized'.
Second Layer Files
Now, to extract the second layer of files, I am using 'Memory Optimized'. Here are the screens for this configuration. I've maximized the data integration units and parallel copies.
Source:
Sink:
Settings:
Results:
2 hours, 7 minutes and 24 seconds.
Testing Summary
Memory Optimised at the max on ADF is faster for the first 10GB file by almost 5 hours, and Microsoft Fabric barely edges out the second extract by about 35 minutes. My laptop is still in a commanding lead at 38 minutes for the first 10GB file.
Комментарии