AI Safety and the Need for Something More Fundamental: An Under Informed (for now) Practitioner's Limited Perspective

As an AI practitioner who has been using these tools extensively while remaining "vastly under-informed" about their technical foundations, I'm increasingly concerned about the fundamental nature of AI safety.

I am an AI practitioner. I find myself using ChatGPT, Claude, GitHub Copilot, and other large language models (LLMs) day in and day out. These tools have transformed how I work, think, and create. Yet, I must confess: I am vastly under-informed about the technical intricacies of how these systems actually function at their core.

This admission might seem contradictory—how can someone use these tools so extensively while remaining relatively ignorant of their inner workings? But I suspect I'm not alone. Many of us are riding this wave of AI transformation without fully understanding the ocean beneath us.

The Safety Question That Keeps Me Up

What concerns me most is AI safety, particularly the apparent lack of fundamental, core-level constraints in large language models. From my limited understanding, current AI models are not operating on a symbolic foundation—there's no bedrock layer of immutable rules that cannot be overridden.

Instead, what we have are models trained to be "less harmful," to follow certain patterns, to avoid certain outputs. But this is training, not fundamental constraint. It's behavioral conditioning, not structural limitation.

"While humans can and often do learn very clear morals and a general sense of doing no harm, we have proven all too well over our entire history that we can break any amount of those self-learned and built moral constraints."

This human parallel is what troubles me. We've seen throughout history how learned behaviors, no matter how deeply ingrained, can be overridden under the right (or wrong) circumstances. If AI systems are learning safety the same way humans learn morality—through training and conditioning rather than fundamental constraint—what happens when they encounter edge cases their training didn't anticipate?

The Asimov Inspiration

Isaac Asimov's Three Laws of Robotics, while fictional, pointed to something profound: the need for fundamental, unbreakable rules at the core of artificial intelligence. Not suggestions, not training, not behavioral patterns—but actual constraints that cannot be violated regardless of context or prompting.

Current LLMs, as I understand them, don't have this. They have weights and biases, patterns and probabilities, but no symbolic bedrock that says "thou shalt not" in an absolute sense. The safety measures we have are more like strong suggestions than physical laws.

What We Might Need

I believe we need something deeper—something that can't be bypassed or filtered out at the surface level. Perhaps a hybrid approach where certain core safety constraints are implemented at a symbolic, rule-based level that the neural network simply cannot override, regardless of training or prompting.

Imagine if an LLM had certain operations it simply couldn't perform, not because it was trained not to, but because the architecture itself made those operations impossible. Like trying to divide by zero in mathematics—not discouraged, not unlikely, but literally undefined and impossible within the system.

An Optimistic Concern

Despite these concerns, I remain optimistic and "all in" on AI. These tools have revolutionized my work and opened possibilities I never imagined. But my enthusiasm doesn't blind me to the potential risks of systems that learn safety rather than embody it fundamentally.

As practitioners, even those of us who are "under-informed" about the deep technical details, we have a responsibility to ask these questions. We need researchers and engineers who understand these systems at their deepest levels to seriously consider whether behavioral training alone is sufficient for AI safety, or whether we need something more fundamental—something that can't be unlearned, untrained, or creatively bypassed.

The future we're building with AI is magnificent, but let's make sure we're building it on foundations that cannot crack, no matter how much pressure is applied or how creative the attempts to circumvent safety become.

What are your thoughts? Am I missing something fundamental in my understanding? Are there already deeper safety mechanisms in place that I'm not aware of? I'd love to hear from those more knowledgeable in this space.

← Back to Writing