https://openai.com/research/gpt-4
The section “Internal factual eval by category” is about how accurate the generated contents are. Improved from ChatGPT's 40-60% to 60-80%. In order to become a usable tool, this accuracy still needs to go much higher, since human reviews are super expensive.
"Internal factual eval by category" 那个部分是我最关心的地方,就是AI编故事的程度。我使用chatgpt和bing ai的经验都让我觉得他们还远远不能当工具,每当你指正它的时候,大概率它就开始编故事了。这个标准从之前的40-60%准确率到现在60-80%。还是远远不够,不然还是要耗费大量人工审查的精力。