@simon. Great question! That is a workflow I tried and might revisit. The issue is instead of aligned 111 photos, you'd have 1776 photos, even if you matched only relevant photos, that would still take very long for the SfM and some might not track at all.
Masking the blurry areas is easier said than done. As far as I know there is no good algorithm that gives you the area in focus, doing this with some edge detect filters is not good enough. Any stacking software uses quite sophisticated algorithms to blend the images together. Your best bet would be to extract the masks from a stacking software.. but because they also align the images, it is also not very straight forward.