Description

**Job Family:** Research & Predevelopment

**Req ID:** 510552

**Agentic AI, LLM Evaluation, and Trustworthy SystemsResearch Internship**

Here at Siemens, we take pride in enabling sustainable
progress through technology. We do this through empowering customers by
combining the real and digital worlds. Improving how we live, work, and move
today and for the next generation! We know that the only way a business thrive is if our people are
thriving. That’s why we always put our people first. Our global, diverse team
would be happy to support you and challenge you to grow in new ways.

Siemens Research & Predevelopment (RPD) is the
central R&D department of Siemens and thus has a key role to shape the
future of our products. RPD acts as a strategic partner to support the
executive units of Siemens. In consequence the main research focus is on future
technologies for industry, infrastructure, mobility, and healthcare. In this
context, we are looking for an Intern that supports our Software Systems and
Processes team in Princeton, NJ by researching and developing scalable intelligent
systems using LLMs and semantic technologies.

**Transform the everyday with us!**

Are you passionate about ensuring the reliability and
robustness of cutting-edge AI systems? We’re looking for an innovative PhD
intern to join our team and contribute to groundbreaking research focused on
implementing a Verification and Validation (V&V) framework for multi-agent
systems.

Modern software is rapidly moving from static
applications to agentic AI systems that plan, reason, call tools, coordinate
across agents, and adapt over multiple steps. As these LLM-powered systems
enter industrial workflows, the critical challenge is no longer only building
capable agents—it is evaluating, verifying, and validating that they behave
reliably, safely, and transparently in complex, uncertain environments. In this
internship, you will research and prototype next-generation methods for LLM and
multi-agent system evaluation, including benchmarks, guardrails, failure-mode
analysis, runtime monitoring, formal methods, and testing technologies. You
will help advance trustworthy AI for real-world industrial software systems
where robustness, explainability, and dependable performance matter.

The internship provides a unique experience to contribute
to innovative industrial applications while mentored by experienced
professionals in an international setting.

This role is preferred to be on-site in Princeton, NJ,
for a hands-on and collaborative experience, however remote candidates will be
considered. The position is a full-time
role for at least 3 months with the possibility of extension.

**Key Responsibilities**

+ Research, design, andprototype V&V methods for multi-agent and agentic AI systems, withemphasis on reliability, safety, repeatability, explainability, androbustness under uncertain operating conditions.

+ Develop evaluationharnesses, benchmarks, and test scenarios for LLM-based agents, includingtool use, multi-step reasoning, orchestration, failure-mode analysis, andadversarial or edge-case behavior.

+ Implementproof-of-concept prototypes in Python using modern AI and agentframeworks, formal methods, testing technologies, and retrieval-augmentedor knowledge-grounded architectures where appropriate.

+ Investigateverification strategies such as model checking, property-based testing,fuzz testing, static or dynamic analysis, runtime monitoring, guardrails,and trace-based observability for complex intelligent systems.

+ Collaborate withresearchers and engineers to define milestones, run experiments, analyzeresults, and translate research insights into scalable industrial softwareconcepts.

+ Document findings,contribute to scientific publications or technical reports, and presentresults clearly to internal and external technical audiences.

**Basic Qualifications**

+ Currently enrolled in aPhD program in Computer Science, Artificial Intelligence, MachineLearning, Software Engineering, Formal Methods, or a closely relatedtechnical field.

+ 3+ years of research orhands-on experience in AI, machine learning, generative AI, softwareengineering, formal methods, autonomous systems, or intelligent agentsystems.

+ Strong programmingskills in Python and practical experience with modern ML or LLM toolingsuch as PyTorch, Hugging Face Transformers, LangChain, LangGraph, AutoGen,Semantic Kernel, CrewAI, or comparable frameworks.

+ Hands-on experiencebuilding, evaluating, or testing LLM-powered applications, agenticworkflows, multi-agent systems, or AI-enabled software engineering tools.

+ Strong understanding ofsoftware architecture, software engineering principles, testingmethodologies, experimentation, and empirical evaluation of complexsystems.

+ Demonstrated ability toconduct independent research, read and synthesize technical literature,analyze complex problems, prototype solutions, and communicate findingsclearly.

+ Proficient in English,both written and verbal.

+ The position requiresthe person to be in the United States of America and hold a valid workpermit in the US for the duration of the internship.

**Preferred Skills**

+ Research experience informal verification, model checking, theorem proving, runtimeverification, AI safety, robust AI, explainable AI (XAI), or trustworthymachine learning.

+ Experience withevaluation of LLMs or agents, including hallucination analysis, benchmarkdesign, tool-use evaluation, prompt-injection testing, red teaming, orreliability metrics.

+ Familiarity with RAGarchitectures, vector databases, knowledge graphs, semantic technologies,ontologies, or graph-based reasoning.

+ Understanding ofreinforcement learning, planning, reward modeling, preferenceoptimization, or post-training approaches for LLMs and autonomous agents.

+ Experience withcloud-native or distributed systems concepts, microservice architectures,APIs, CI/CD, Git, Docker, Kubernetes, Azure, AWS, or comparable platforms.

+ Experience with testingframeworks for complex software systems, including property-based testing,fuzz testing, simulation-based testing, static analysis, orexecution-based evaluation.

+ Track record ofresearch publications, open-source contributions, academic projects, ordemonstrable prototypes related to AI, software engineering, formalmethods, or agentic systems.

+ Excellentproblem-solving skills, attention to detail, and ability to quickly learnand apply new technologies, tools, and research methods.

+ Strong written andverbal communication skills, with the ability to articulate complextechnical concepts to research and engineering audiences.

**AboutSiemens:**

We are a
global technology company focused on industry, infrastructure, transport, and
healthcare. From more resourceefficient factories, resilient supply chains, and
smarter buildings and grids, to sustainable transportation as well as advanced
healthcare, we create technology with purpose adding real value for customers.
Learn more about Siemens here (https://www.siemens.com/global/en/company.html) .

Our
Commitment to Equity and Inclusion in our Diverse Global Workforce:

We value
your unique identity and perspective. We are fully committed to providing
equitable opportunities and building a workplace that reflects the diversity of
society, while ensuring that we attract the best talent based on
qualifications, skills, and experiences. We welcome you to bring your authentic
self and transform the everyday with us.

#LI-JS

#LI-Remote

#ArtificialIntelligence,
#MachineLearning, #GenerativeAI

$32-$47

**Organization:** Foundational Technologies

**Job Type:** Full-time

**Category:** Internal Services

Agentic AI, LLM Evaluation, and Trustworthy Systems Research Internship

Description

Siemens