One of the problems with believable different body shapes like obese, but also old people is that they would require own animation sets. We might not think about it, but it becomes quickly obvious if people don't behave the way their body suggest they would.
Also different size variation have problems stemming from animations needing to fit together - think of handshakes - hands need to be in there right position for it them to connect. Not every engine has the tech implemented to handle those blends between heights.
One more issue comes from optimization - all those differences need to be handled on runtime. You need to load all those different body shapes (and skeletons controlling them) into memory while you play. Same with the different animations. So on the one hand you have the whole creation pipeline getting more and more convoluted, but also while the game is played the variations will be taxing the hardware. As hardware is improving and development tools help automatize some of the tasks necessary, we see more and more of this in game. But sadly it sounds far easier than it actually is and that's why we often see a very uniform set of bodies.
I oversimplified here because it makes no sense to go into more detail, since each production and game handles this differently, but its one of the big topics and problems for games with crowds.