Yeah you're advocating phase 3. It probably *is* the best for the vast majority of people, though as you note, some people can't physically do it.
And if you choose that option and speak a lot in the initial learning phases, you're going to end up with a lot of ossified errors, which may or may not matter.
Re: Pimsleur, I specifically advocate it for Phase 0, just the "phonetic architecture". Just learning to recognize and reproduce the most basic sounds. While also picking up survival level phrases.