Mostly because it's hard to do sound related detection with the games systems.
Sight is easy, you just vision cone and things can obscure that cone.
But sound is trickier because you also need a way to highlight how much sound something will make and how that'll be percieved (With things like background noise, other noises occuring at the same time etc)
The combination of sound and sight is what typically waters down the effectivness of stealth in TT D&D. With just sight it's super easy to just cheese everything with a single effect (Also making the "Stealth" skill entirely pointless since it only does anything if you're visible AND within a sight cone. If you're behind someone, invisible or you've neutered their vision cone you don't roll a stealth check ever)
As far as reacting to stuff, it's probably too much of a bother trying to make NPC's react to things like Fog Cloud and Darkness whilst also not reacting to other spells like Longstrider or Aid (It took long enough just to get it so NPC's didn't constantly react to Ranger's animal companions and mages summons...)
Meanwhile, stealing is entirely pointless because gold is not a rare resource so who cares if someone steals something instead of spending some of their 50,000 unusued gold coins that's doing nothing but weighing them down...
This is "how".
Better question is "why". Why do they design it this way and allow it. It's not fun playing a thief when you can just take stuff.
Just like it's not fun playing combat when you can just decide to win.