Shortly after Microsoft Fabric came out a few months ago, I created an account and started playing around. I love to push technology to its limits, so I decided to try to do the same with Microsoft Fabric. I looked around the internet and then my co-workers for a 'large' dataset.
I was able to find a 10.1GB tar file. I created a Data Lake in Microsoft Fabric and uploaded the file. Then, I created a Data Factory Pipeline in Microsoft Fabric and added a single Copy Activity.
Here are a few screenshots of the configuration:
General:
Source:
Destination:
Settings:
Notice that the Intelligent throughput optimization and Degree of copy parallelism are set to 'Auto'.
Now to execute...it sat, and sat, and didn't hardly extract anything. I thought that was odd, so I stopped it and decided to change some of the settings.
First, I changed Intelligent throughput optimization to Maximum.
Then, I changed the Degree of copy parallelism to 32, the max available. I realize this may not be used since I only have one source file, but we will max it out anyway to see if it helps the downstream multiple file copy.
Now off we go ....12 hours later, failure!! What?! It timed out and stopped at the 12 hour mark.
So, I went back in and changed the timeout to 24 hours.
Now we execute and wait ...
...Play some elevator music...
...Build a few Lego sets...
How long did it take? Not 12 hours and 1 minute...but 20 hours and 44 minutes!
I don't know about you, but I was expecting better performance.
We had a total of 535,678 inside of 36 folders, totaling 10.8GB. Each one of those files contained .gz compressed files, each containing a JSON.
I looked around the runtime information and discovered that it did not keep those settings when I changed the settings to the max for both Intelligent throughput optimization and Degree of copy parallelism. It decided to revert the runtime settings for those back to the lowest, standard, and 1.
This wasn't what I expected, so I created a ticket with Microsoft to see why it took so long and why my settings were reverted during runtime.
Comments