Nice. I'll leave any autocorrelation tests up to you!
Originally Posted by Triddle
You can see the intuition behind this by imagining a coin flip - you'll get a much better fit to a uniform distribution by first flipping your coin with 2 heads on it, then flipping your coin with 2 tails on it, etc, etc. In fact on any even number of flips your empirical distribution is going to be exactly the distribution you 'expect', and on odd flips it won't be far off. If you were to perform Pearson's chi squared test, the test statistic would be 0 (exactly, for an even number of flips), but you'd be super skeptical about it. You might argue that this sequence isn't even a random variable, but I can flip an actual fair coin to decide which of my two rigged coins to start with, and now it's a genuine random variable. The marginal distribution of heads/tails outcomes is genuinely uniform, but the joint distribution of the nth flip given the outcome of the n-1th is either [1,0] or [0,1], and never [0.5,0.5]. This example is pretty extreme, so you get literal 'suspiciously good' values of the Chi squared statistic, but in more realistic scenarios you'll never realize that the system is rigged in this way from a Chi squared test. You can see from my graph that my rigged dice simulation does better on Chi squared tests than the IID uniform one, but not so much so that you'd notice.
I wouldn't say any of my statistics were "too good," although idk exactly what value would indicate that. The lowest ones were still in the teens. Also, isn't some of what you're talking about above with the coin covered by degrees of freedom? A coin only has a single DoF.

Originally Posted by Triddle
Also, be careful with multiple testing. I see in one of those threads you posted "BUT! If we combine your unweighted chart with @Saberem's unweighted data, we get a Chi^2 of 32.26 which is GREATER than the 95% confidence value of 31.14. Thus, we can conclude that Larian's unweighted dice rng is NOT random." - you need to correct for multiple comparisons. If you test enough times, you'll eventually get a test statistic greater than your critical value by chance (unless your sampling method actually IS biased). Applying Bonferroni correction and assuming you performed only two tests, the critical value would be 34.17, and since you got a Chi^2 of 32.26 you would fail to reject the hypothesis that the data followed a uniform distribution.
To be clear, I actually combined the raw data for that analysis, which should be equivalent to a single dataset with a larger # of rolls, no?

Originally Posted by Triddle
The non-karmic dice data looks good. If you wanted to scrutinize it I'd be looking for autocorrelation - you can see from my graph that for samples of size 300, autocorrelations stronger than about 0.2 in magnitude would be suspicious. I'd be very surprised if there was anything wrong with the basic dice rolling implementation though. Good random sequences are precisely the ones that look bad to the layman - if the players didn't think it was a bad implementation, that would be a strong indicator that it actually was. That is exactly why Larian have tried to implement the karmic dice, they want to make a version of dice rolling that the players think is fair, rather than one that actually is fair, and those two things are mutually exclusive. It's a shame they seem to have done such a bad job of it...
Actually, the original base RNG system *was* bad. If you look at the 1st plot in the "Niara Data" tab, clearly low rolls were being preferentially followed by other low rolls, forming this sinusoid pattern. No one actually did a statistical test on the level of correlation though. And originally, their Karmic Die system alternated between low and high values, as shown by the 2nd plot.

Again, this has since been changed and their base rng system seems fine, whereas the Karmic Die system seems to have been modified to result in higher chances to-hit, rather than flipping between high and low rolls.