Open-ended scientific tasks lack rigorous, domain-expert benchmarks | SaaSBrowser.ai