Can't give you all the answers, but I can tell you that homophones are mostly dealt with very well, based on context. I can also say that is far superior to the speech to text in the normal Google assistant operation, which is handled remotely I think. You should have heard me talking to the little b*ast*rd last night:
"Ok Google, play me some Barney Kessel"
"Sure, playing Barney Castle....."
{Repeat ad nauseum}
Background noise is dealt with well, too