I'm using a local relay, as well, and just changed the app, so that I can use it with an "only local" setting, to test it in production through a browser bot (Playwright).
Make sure you have a deep-to-thin setup (the further you get from the code unit -> integration -> feature/e2e/smoke, the flimsier the test should be, so that you don't get steamrolled maintaining high-level test suites.
Are you running a linter or similar static test before starting the dynamic ones? They actually catch the most bugs and then you break out, earlier.
Make sure that you have steps and break out on every failed step and have a success message at the end.