Aside from that 1 attack is a pretty bad sample size. You need like hundreds of attacks on different armoured mobs...
I agree with the assertion that a larger sample is needed, but you could certainly investigate with a much smaller scale than you suggest. Just a single target, 10 hits with factor applied, 10 hits without. Differences should emerge at that scale (especially for a 50% modifier).
Double that amount and you could support statistical testing, but that's likely unnecessary for individual use
