Description
**Job Family:** Research & Predevelopment
**Req ID:** 510554
**LLM-Based Knowledge Extraction and FailureAnalysis Internship**
Here at Siemens, we take pride in enabling sustainable
progress through technology. We do this through empowering customers by
combining the real and digital worlds. Improving how we live, work, and move
today and for the next generation! We know that the only way a business
thrive is if our people are thriving. That’s why we always put our people
first. Our global, diverse team would be happy to support you and challenge you
to grow in new ways.
Siemens Research & Predevelopment (RPD) is the central
R&D department of Siemens and thus has a key role to shape the future of
our products. RPD acts as a strategic partner to support the executive units of
Siemens. In consequence the main research focus is on future technologies for
industry, infrastructure, mobility, and healthcare. In this context, we are
looking for an Intern that supports our Software Systems and Processes team in
Princeton, NJ by researching and developing scalable intelligent systems using
LLMs and semantic technologies.
**Transform the everyday with us!**
Are you passionate about pushing the boundaries of AI and
data science? We’re looking for an innovative PhD intern to join our team and
contribute to groundbreaking research focused on developing and improving
knowledge graphs for advanced intelligent systems.
Modern industrial software systems generate large volumes of
complex engineering signals, logs, test results, and failure information that
are difficult to interpret consistently with traditional automation alone. In
this internship, you will work on LLM-based knowledge extraction and failure
classification workflows that transform technical inputs into structured,
explainable JSON-based outputs. The focus is on prompt engineering, context
engineering, model-output debugging, and iterative quality improvement—understanding
why a model selected a particular failure class, which evidence influenced the
result, where context was missing or misleading, and how to make the pipeline
more accurate, transparent, and reliable for industrial use cases.
The internship provides a unique experience to contribute to
innovative industrial applications while mentored by experienced professionals
in an international setting.
**This role is preferred to be on-site in Princeton, NJ,for a hands-on and collaborative experience, however remote candidates will beconsidered. The position is a full-time role for at least 3 monthswith the possibility of extension.**
**Key Responsibilities**
+ Design,test, and refine prompts and context-selection strategies that help modelsclassify failures, use relevant evidence, and produce consistentstructured JSON outputs.
+ AnalyzeLLM output quality to understand why models choose incorrect failureclasses, overlook important evidence, rely on misleading context, orgenerate inconsistent explanations.
+ Createevaluation examples, test cases, scoring rubrics, and error-analysissummaries to measure classification accuracy, evidence quality,explanation quality, and robustness.
+ ImproveJSON schemas, validation checks, metadata fields, and intermediaterepresentations used by downstream analysis and reporting workflows.
+ Prototypeimprovements to data preparation, retrieval or context assembly, prompttemplates, output formatting, post-processing, and evaluation logic inPython-based AI pipelines.
+ Collaboratewith software engineers, AI researchers, and domain experts to understandfailure categories, edge cases, expected model behavior, and qualityrequirements.
+ Documentexperiments, observed failure modes, design decisions, evaluation results,and recommendations through internal demos, technical reports, andpotential scientific publications.
**Basic Qualifications**
+ Currentlyenrolled in a Master’s or PhD program in Computer Science, ArtificialIntelligence, Data Science, Knowledge Engineering, Information Science, ora closely related technical field.
+ 3+years of foundational knowledge and research or project experience inArtificial Intelligence, Machine Learning, Generative AI, NLP, DataEngineering, or knowledge-based intelligent systems.
+ 3+years of hands-on programming experience in Python, including experiencewith AI/ML libraries or frameworks such as PyTorch, TensorFlow, HuggingFace Transformers, scikit-learn, LangChain, LlamaIndex, or similar tools.
+ Hands-onexperience with prompt engineering, context engineering, structured LLMoutputs, or LLM-based information extraction and classification workflows.
+ Strongunderstanding of data modeling, structured outputs, metadata design,schema quality, validation concepts, and data quality principles.
+ Experiencedesigning, implementing, or evaluating AI workflows that combine LLMs withstructured context, retrieval, information extraction, classification, orrule-based validation.
+ Demonstratedability to conduct independent research, critically analyze complexproblems, work through ambiguity, and deliver structured technical outputson defined timelines.
+ Strongwritten and verbal communication skills in English, with the ability toexplain technical concepts clearly to both technical and domain-expertaudiences.
+ Theposition requires the person to be in the United States of America andhold a valid work permit in the US for the duration of the internship.
**Preferred Skills**
+ Knowledgeof transformer-based models, attention mechanisms, NLP/NLU methods, namedentity recognition, relation extraction, question answering, or textclassification.
+ Experiencebuilding reproducible data or AI pipelines, including data ingestion,validation, testing, documentation, and workflow orchestration with toolssuch as Apache Airflow, Prefect, Git, Docker, or similar technologies.
+ Abilityto work with domain experts to translate engineering failure categories,business requirements, and quality expectations into clear prompts,evaluation criteria, and structured output formats.
+ Excellentanalytical skills, attention to detail, and ability to reason about modelbehavior, evidence quality, data ambiguity, reproducibility, andmaintainability of AI pipeline outputs.
+ Capacityto work independently, prioritize effectively, communicate progressclearly, and collaborate in an interdisciplinary research environment.
+ Interestin applying LLMs, knowledge extraction, and quality-focused AI engineeringto industrial software systems, intelligent automation, orenterprise-scale engineering use cases.
About Siemens:
We are a global technology company focused on industry,
infrastructure, transport, and healthcare. From more resource efficient
factories, resilient supply chains, and smarter buildings and grids, to
sustainable transportation as well as advanced healthcare, we create technology
with purpose adding real value for customers.
Our Commitment to Equity and Inclusion in our Diverse Global
Workforce:
We value your unique identity and perspective. We are fully
committed to providing equitable opportunities and building a workplace that
reflects the diversity of society, while ensuring that we attract the best
talent based on qualifications, skills, and experiences. We welcome you to
bring your authentic self and transform the everyday with us.
#LI-JS
#LI-Remote
#ArtificialIntelligence, #MachineLearning, #GenerativeAI
$32-$47
**Organization:** Foundational Technologies
**Job Type:** Full-time
**Category:** Internal Services





