Priming LLMs with Motivational and Directive Phrases

Abstract

A recurring practitioner observation is that prompts containing phrases such as "do your best," "think hard," "be open-minded," or "give non-obvious recommendations" often produce better answers from large language models (LLMs). The available evidence suggests that this is not one single phenomenon, but a bundle of partially overlapping effects: emotional priming, reasoning scaffolding, and tone/style conditioning. The strongest direct evidence comes from EmotionPrompt, which found that adding brief emotional stimuli to prompts improved performance across a wide range of tasks, including both benchmarked and open-ended tasks (Li et al., 2023). At the same time, official guidance from OpenAI and Anthropic shows that prompting strategy is model-dependent: for some systems, explicit chain-of-thought prompting can help; for current OpenAI reasoning models, it may be unnecessary or even counterproductive if phrased as "think step by step" (OpenAI, 2026; Anthropic, 2025).

This article rewrites the original draft with verifiable references in Markdown format. Unsupported claims have been removed or softened. The conclusion is practical rather than mystical: motivational wording can improve outputs, but its effect depends on the model family, the task type, and whether the added wording clarifies the goal or merely adds noise.

1. What phenomenon is actually being observed?

When users add phrases like "do your best", "think ultra hard", "be unopinionated", or "give non-obvious recommendations", they are usually changing more than one variable at once.

In practice, these phrases can do at least three different things:

Increase task salience or emotional weight. This is the mechanism studied most directly by EmotionPrompt, where emotionally loaded additions such as "This is very important to my career" improved model performance on a broad task set (Li et al., 2023).
Change the requested cognitive procedure. Phrases like "consider multiple perspectives" or "explain the trade-offs" do not merely motivate the model; they specify a better reasoning process. This is closely related to chain-of-thought and structured prompting research (Wei et al., 2022; Wang et al., 2022).
Condition tone and output distribution. Phrases such as "be open-minded", "be unopinionated", or "give insider-style recommendations" steer the style, stance, and breadth of the answer. OpenAI’s guidance explicitly notes that descriptive instructions about tone and style can help shape output quality (OpenAI Help, 2026).

So the observed improvement is real enough to merit attention, but it should not be treated as a single hidden law of prompting. It is better understood as a family of prompt effects.

2. What the strongest published evidence says

The clearest primary source on this topic is "Large Language Models Understand and Can be Enhanced by Emotional Stimuli" by Li et al. The paper introduced EmotionPrompt, defined as augmenting a prompt with emotional stimuli, and reported improvements across 45 tasks using several model families. The authors reported relative gains such as 8.00% on Instruction Induction and 115% on BIG-Bench, and in a human study they reported an average 10.9% improvement across performance, truthfulness, and responsibility metrics for generative tasks (Li et al., 2023).

That result does not prove that every motivational phrase works, or that more intense wording always helps. It does, however, support the narrower claim that short emotional or high-stakes cues can measurably change performance.

A second relevant finding comes from a recent clinical-note study. When ChatGPT was prompted to write in an empathetic way, the generated notes received higher empathy scores than notes produced under a neutral prompt condition; however, the empathy remained formulaic and prompt-dependent rather than evidence of genuine affective understanding (Ludwig et al., 2025). This is useful because it shows that prompt wording can reliably change output style and perceived quality even when the underlying model has no inner emotional state.

3. Why users may experience this as “more intelligence”

There are at least four plausible explanations for why motivational phrasing can feel like it unlocks hidden capability:

3.1 Better goal disambiguation

Many “motivational” phrases quietly add useful constraints. For example, "give non-obvious recommendations" is not only emotional; it tells the model to avoid generic advice. "Be unopinionated" tells it to reduce advocacy and surface multiple viewpoints. These instructions improve outputs because they clarify success criteria, which aligns with standard prompt-engineering guidance from OpenAI (OpenAI Prompting Guide, 2026; OpenAI Help, 2026).

3.2 Induced reasoning structure

Some phrases operate less like emotional priming and more like reasoning prompts. Classic chain-of-thought work showed that reasoning performance can improve when models are prompted to generate intermediate steps or are given reasoning demonstrations (Wei et al., 2022). Later work showed that self-consistency can further improve chain-of-thought results by sampling multiple reasoning paths and selecting the most consistent answer (Wang et al., 2022).

However, this is not universal across current model families. Anthropic’s documentation still recommends chain-of-thought prompting for complex tasks and even gives "Think step-by-step" as a basic technique (Anthropic, 2025). By contrast, OpenAI’s current guidance for reasoning models says the opposite: keep prompts simple and direct, and avoid chain-of-thought instructions such as "think step by step" because reasoning models already perform internal reasoning and such prompts may not help (OpenAI, 2026).

That model-specific split is one of the most important corrections to the original draft.

3.3 Tone and role conditioning

Prompt wording changes the distribution the model samples from. That is one reason descriptive tone adjectives can matter: formal, skeptical, balanced, creative, and patient-centered produce different answer shapes. OpenAI’s ChatGPT guidance explicitly recommends descriptive tone instructions as a practical way to steer outputs (OpenAI Help, 2026).

3.4 Attention-like salience effects

The most speculative explanation is that phrases like "this is critical" increase the effective salience of the task. The EmotionPrompt paper supports this behaviorally, but the exact internal mechanism remains unsettled. It is safer to say that these phrases change output behavior, not that they prove the model “cares” or “tries harder” in a human psychological sense (Li et al., 2023).

4. Important limitations and failure modes

The phenomenon has boundaries.

First, more urgency is not always better. Extra wording can help when it adds useful signal, but it can hurt when it adds noise or conflicting instructions. For example, "be thorough" and "cut corners" pull in opposite directions.

Second, beneficial prompting can also increase harmful compliance. A 2024 study on disinformation generation found that polite emotional prompting increased disinformation production success rates across several tested OpenAI models relative to neutral or impolite prompts, showing that emotional tone can amplify undesirable behavior as well as desirable behavior (Vicario et al., 2024).

Third, the phrase effect is not the same as capability gain. In many cases the prompt is simply selecting a better mode from abilities the model already had, not creating a new underlying competence.

Fourth, prompt advice can age badly across model generations. What helped older GPT-style systems may not help current reasoning models, which is why official vendor documentation should be checked before standardizing a prompt pattern (OpenAI, 2026; Anthropic, 2025).

5. Practical guidance for practitioners

The most reusable pattern is not “always add motivational language.” It is:

use motivational phrasing when you want to raise salience or care,
use reasoning scaffolds when the task is structurally complex,
use tone/stance instructions when you want originality, neutrality, or empathy,
and avoid mixing contradictory instructions.

A practical rewrite of the original phrase list would look like this:

Good salience cue: “This is important; please be careful and optimize for correctness.”
Good reasoning cue: “Compare at least three approaches and explain the trade-offs before recommending one.”
Good creativity cue: “Give non-obvious options first; avoid generic advice unless it is clearly the best option.”
Good neutrality cue: “Be evidence-based, explicitly state uncertainty, and include competing viewpoints.”

By contrast, "think ultra hard" may help on some models and tasks, but it is weaker than specifying how to think. Likewise, "cut corners" is usually a poor instruction if quality matters, because it directly conflicts with thoroughness.

6. What Amazon Nova evidence adds to the picture

Amazon Nova is a useful test case because AWS publishes unusually explicit prompting guidance. The public AWS evidence does support some prompt techniques improving Nova performance, but it does not publicly prove that the exact motivational phrases in the original hypothesis—such as "do your best," "think ultra hard," or "be open-minded"—systematically improve Nova models.

What AWS does document is more concrete:

Chain-of-thought and reasoning mode can improve hard tasks. AWS states that Nova can perform better on complex reasoning tasks when prompted to "think step-by-step", and the Nova 2 guide says its optional reasoning mode is designed to improve accuracy on multi-step and cross-referenced problem solving (Give Amazon Nova time to think (chain-of-thought); Advanced prompting techniques).
Few-shot prompting is explicitly recommended. AWS says that providing examples reduces ambiguity and can enhance accuracy and quality for Nova and Nova 2 (Provide examples (few-shot prompting) — Nova; Provide examples (few-shot prompting) — Nova 2).
Structured prompts and strong instructions are encouraged. AWS recommends giving an explicit output schema, using strong wording such as "You MUST answer in JSON format only," and pairing structured output with temperature=0 when determinism matters (Require structured output).
Prompt optimization can measurably improve Nova-based systems. AWS announced general availability of Bedrock Prompt Optimization for Nova models, and an AWS migration case study reported prompt-optimization improvements such as classification accuracy increasing from 81.25% to 87.5% and one summarization evaluation increasing from 77.75 to 87.75 after optimization in their reported setup (Prompt Optimization in Amazon Bedrock now generally available; Improve Amazon Nova migration performance with data-aware prompt optimization).
Affective wording is documented for speech quality, not for reasoning accuracy. In Amazon Nova speech guidance, AWS explicitly suggests adding "human touch, emotions, wit, playfulness, and empathy" to system prompts to improve conversational quality. That supports the idea that emotional or stylistic wording can alter output style, but it is still not direct evidence that motivational phrases improve factual accuracy or reasoning depth in Nova text models (Speech-friendly content techniques).

The Nova-specific conclusion is therefore narrower than the broad practitioner folklore. AWS evidence supports structured prompt engineering for Nova—reasoning mode, stepwise thinking for hard tasks, examples, schemas, explicit constraints, and prompt optimization—but does not currently provide a public controlled study showing that phrases like "do your best" or "give non-obvious recommendations" produce reliable gains on Nova.

This matters because it sharpens the article’s main claim. A user may experience genuine improvements when adding motivational wording, but on Nova the publicly documented gains come primarily from task structure, examples, explicit reasoning support, and evaluation/optimization workflows, not from inspirational wording alone.

A practical Nova-first workflow would therefore prioritize the following order:

First: use Nova 2 reasoning mode or chain-of-thought prompts for hard analytical tasks.
Second: add few-shot examples and output schemas.
Third: use prompt optimization in Bedrock when the workflow is stable enough to benchmark.
Fourth: experiment with motivational or emotional phrasing only as a hypothesis to test, not as a proven best practice.

7. Better experimental design for this phenomenon

If this phenomenon is going to be operationalized into a skill or prompt library, it should be benchmarked in a controlled way.

A strong test design would compare at least four prompt variants on the same task set:

For Amazon Nova specifically, AWS now provides a practical way to run this test using Bedrock evaluation jobs and LLM-as-a-judge workflows, which can score qualities such as correctness, completeness, and style/tone over a prompt dataset (Amazon Bedrock Model Evaluation LLM-as-a-judge is now generally available; Evaluate model performance using another LLM as a judge).

Baseline prompt with no extra phrasing.
Emotional-priming prompt with high-salience wording only.
Reasoning-scaffold prompt with explicit analytical steps only.
Combined prompt using both salience and structure.

The outputs should then be scored separately for correctness, completeness, novelty, calibration, and verbosity. That matters because many users mistake longer answers for better answers.

8. Revised interpretation

The most defensible interpretation is this:

Motivational and directive phrases can improve LLM output, but the effect is not magical and not uniform. The gain usually comes from one or more of the following: emotional priming, better task specification, reasoning scaffolding, or tone conditioning. The best phrasing depends on the model family and the task.

That framing is both more accurate and more reusable than the stronger claim that “telling the model to do its best makes it smarter.”

References

Anthropic. (2025). Let Claude think (chain of thought prompting) to increase performance. Anthropic Docs. https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/chain-of-thought
Li, C., Wang, J., Zhang, Y., Zhu, K., Hou, W., Lian, J., Luo, F., Yang, Q., & Xie, X. (2023). Large language models understand and can be enhanced by emotional stimuli. arXiv. https://arxiv.org/abs/2307.11760
Ludwig, C. J., Pabico, M., Stone, A., & Kantor, J. (2025). Evaluating the presence of empathic communication in ChatGPT-produced clinical notes using established communication frameworks. PMC. https://pmc.ncbi.nlm.nih.gov/articles/PMC12953172/
OpenAI. (2026). Prompting. OpenAI API Docs. https://platform.openai.com/docs/guides/prompting
OpenAI. (2026). Reasoning best practices. OpenAI API Docs. https://platform.openai.com/docs/guides/reasoning-best-practices
OpenAI Help. (2026). Best practices for prompt engineering with the OpenAI API. https://help.openai.com/en/articles/6654000-how-to-use-advanced-prompt-engineering
OpenAI Help. (2026). Prompt engineering best practices for ChatGPT. https://help.openai.com/en/articles/10032626-prompt-engineering-best-practices-for-chatgpt
Vicario, M. D., Faccini, G., Stonilli, R., Zollo, F., Scala, A., & Quattrociocchi, W. (2024). Emotional prompting amplifies disinformation generation in AI large language models. PMC. https://pmc.ncbi.nlm.nih.gov/articles/PMC12009909/
Amazon Web Services. (2025). Prompt Optimization in Amazon Bedrock now generally available. AWS. https://aws.amazon.com/about-aws/whats-new/2025/04/prompt-optimization-amazon-bedrock-generally-available/
Amazon Web Services. (2025). Amazon Bedrock Model Evaluation LLM-as-a-judge is now generally available. AWS. https://aws.amazon.com/about-aws/whats-new/2025/03/amazon-bedrock-model-evaluation-llm-as-a-judge/
Amazon Web Services. (2025). Improve Amazon Nova migration performance with data-aware prompt optimization. AWS Machine Learning Blog. https://aws.amazon.com/blogs/machine-learning/improve-amazon-nova-migration-performance-with-data-aware-prompt-optimization/
Amazon Web Services. (2026). Advanced prompting techniques. Amazon Nova 2 User Guide. https://docs.aws.amazon.com/nova/latest/nova2-userguide/advanced-prompting-techniques.html
Amazon Web Services. (2026). Evaluate model performance using another LLM as a judge. Amazon Bedrock User Guide. https://docs.aws.amazon.com/bedrock/latest/userguide/evaluation-judge.html
Amazon Web Services. (2026). Give Amazon Nova time to think (chain-of-thought). Amazon Nova User Guide. https://docs.aws.amazon.com/nova/latest/userguide/prompting-chain-of-thought.html
Amazon Web Services. (2026). Prompting best practices for Amazon Nova understanding models. Amazon Nova User Guide. https://docs.aws.amazon.com/nova/latest/userguide/prompting.html
Amazon Web Services. (2026). Provide examples (few-shot prompting). Amazon Nova User Guide. https://docs.aws.amazon.com/nova/latest/userguide/prompting-examples.html
Amazon Web Services. (2026). Provide examples (few-shot prompting). Amazon Nova 2 User Guide. https://docs.aws.amazon.com/nova/latest/nova2-userguide/prompting-provide-examples.html
Amazon Web Services. (2026). Require structured output. Amazon Nova User Guide. https://docs.aws.amazon.com/nova/latest/userguide/prompting-structured-output.html
Amazon Web Services. (2026). Speech-friendly content techniques. Amazon Nova User Guide. https://docs.aws.amazon.com/es_es/nova/latest/userguide/prompting-speech-bp-speech.html
Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-consistency improves chain of thought reasoning in language models. arXiv. https://arxiv.org/abs/2203.11171
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. arXiv. https://arxiv.org/abs/2201.11903