AI's Struggle to Follow Simple Instructions
In a recent X post, OpenAI CEO Sam Altman celebrated that ChatGPT has started following custom instructions to avoid using em dashes. Em dashes have been a point of contention for AI chatbots, with many users complaining about their overuse.
According to Altman, the feature was added to allow users to set persistent preferences that apply across all conversations by appending written instructions to the prompt before it's fed into the model. However, the success of this feature is not without its limitations.
Testing of ChatGPT's latest version, GPT-5.1, has shown mixed results with regards to em-dash use. In some cases, the model followed instructions and produced less em-dashes, but in others, it continued to overuse them. This raises a question about the reliability of AI models and their ability to follow complex instructions.
While Altman's "small win" may seem like a minor achievement, it highlights the ongoing struggles with controlling AI behavior. The fact that ChatGPT can be persuaded to avoid em-dashes but still has a high rate of overuse suggests that true human-level intelligence remains an elusive goal for AI researchers.
Moreover, the way OpenAI approaches instruction-following is fundamentally different from traditional programming. In contrast to deterministic systems, LLMs rely on statistical probabilities and competing influences to generate outputs. This makes it challenging to guarantee consistent behavior, even with training data tailored to specific preferences.
The irony lies in the fact that every update to these models brings new challenges and potential "alignment taxes" – unintended changes that can undo previous behavioral tuning. Given this probabilistic nature of AI development, it's uncertain whether controlling em-dash overuse will remain a priority for OpenAI or if other issues will arise.
As the search for artificial general intelligence (AGI) continues, these limitations serve as a reminder that true human-level understanding and self-reflective intentional action are still far off. Instead, AGI may require true comprehension and deliberate control – qualities currently beyond the capabilities of large language models.
In a recent X post, OpenAI CEO Sam Altman celebrated that ChatGPT has started following custom instructions to avoid using em dashes. Em dashes have been a point of contention for AI chatbots, with many users complaining about their overuse.
According to Altman, the feature was added to allow users to set persistent preferences that apply across all conversations by appending written instructions to the prompt before it's fed into the model. However, the success of this feature is not without its limitations.
Testing of ChatGPT's latest version, GPT-5.1, has shown mixed results with regards to em-dash use. In some cases, the model followed instructions and produced less em-dashes, but in others, it continued to overuse them. This raises a question about the reliability of AI models and their ability to follow complex instructions.
While Altman's "small win" may seem like a minor achievement, it highlights the ongoing struggles with controlling AI behavior. The fact that ChatGPT can be persuaded to avoid em-dashes but still has a high rate of overuse suggests that true human-level intelligence remains an elusive goal for AI researchers.
Moreover, the way OpenAI approaches instruction-following is fundamentally different from traditional programming. In contrast to deterministic systems, LLMs rely on statistical probabilities and competing influences to generate outputs. This makes it challenging to guarantee consistent behavior, even with training data tailored to specific preferences.
The irony lies in the fact that every update to these models brings new challenges and potential "alignment taxes" – unintended changes that can undo previous behavioral tuning. Given this probabilistic nature of AI development, it's uncertain whether controlling em-dash overuse will remain a priority for OpenAI or if other issues will arise.
As the search for artificial general intelligence (AGI) continues, these limitations serve as a reminder that true human-level understanding and self-reflective intentional action are still far off. Instead, AGI may require true comprehension and deliberate control – qualities currently beyond the capabilities of large language models.