3:1 rule has nothing to do with phase relationship, as the 3:1 rule doesn't even know which frequencies are involved ...
The 3:1 rule can even become 2:1 rule when using highly directional mics, or as said above, when the levels will not be equal.
It is all about the comb filtering that will exist when mixing both signals, without time aligning, and at equal levels.
The 3:1 rule then says that the result will not be too dramatic, that's all.
Of course, this rule does not incorporate more complicated matters, as eg. the phase relationship between the opening of a piano and above the lid, or the phase relationship between the top and bottom soundboard of a violin.
etc etc
In short, time aligning in this instance will only help if you hold the subjects head in a vise screw !