Description:
Role: Solution Architect - Agentic AI Data Engineer
Location options: New Jersey, Atlanta GA, Dallas TX, Chicago IL & Virginia - candidate should be willing to travel to client sites
The Agentic AI Data Engineer is a hands-on, client-facing role within TCS’s AI & Data group (Americas), responsible for building and maintaining the data pipelines and infrastructure that power AI agent systems. This role transforms architectural blueprints into production-ready pipelines, ensuring AI agents have continuous access to clean, timely, and relevant data. You will work across industries—BFSI, Manufacturing, Life Sciences, Telecom, Retail, and more—handling diverse data types and ensuring they are ingested, transformed, and delivered to AI systems efficiently and securely.
Key Responsibilities:
•Data Ingestion: Build pipelines to extract data from databases, APIs, files, and streaming sources using tools like Python, Kafka, and ETL frameworks.
•Data Transformation: Clean, normalize, and enrich raw data using Spark, SQL, or cloud-native tools to make it AI-ready.
•Data Loading: Load processed data into target systems such as vector databases, ElasticSearch, graph DBs, or cloud warehouses.
•Real-Time Feeds: Implement streaming pipelines using Kafka, Flink, or cloud services for real-time AI applications.
•Automation & Scheduling: Use Airflow, cloud triggers, or Lambda functions to automate and orchestrate data workflows.
•API Integration: Develop connectors or services to fetch data from external APIs or systems on demand.
•RAG & Knowledge Base Updates: Collaborate with AI Data Architects to ingest and embed documents for retrieval-augmented generation (RAG).
•Testing & Validation: Implement unit, integration, and data validation tests to ensure pipeline reliability and data quality.
•Performance Optimization: Tune SQL queries, Spark jobs, and infrastructure to meet SLAs and minimize latency.
•Documentation & Handover: Create runbooks, pipeline documentation, and train client teams for post-deployment support.
•Industry-Specific Handling: Adapt pipelines for domain-specific needs (e.g., HIPAA compliance in Healthcare, SOX in Finance).
•Agile Collaboration: Work in agile teams, participate in sprint planning, and coordinate with client stakeholders.
•Pipeline Maintenance: Monitor and evolve pipelines post-deployment, handling schema changes, scaling, and troubleshooting.
•Continuous Learning: Stay current with evolving tools, frameworks, and best practices in data engineering and AI integration.
Qualifications:
•5–8+ years of experience in data engineering, with exposure to AI/ML data workflows.
•Strong programming skills in Python and SQL; familiarity with Java/Scala is a plus.
•Experience with ETL tools (Airflow, AWS Glue, Azure Data Factory) and big data frameworks (Spark, Hadoop).
•Proficiency in streaming technologies (Kafka, Flink, Event Hubs).
•Hands-on experience with cloud platforms (AWS, Azure, GCP) and cloud-native data services.
•Familiarity with vector databases, ElasticSearch, and graph databases.
•Strong understanding of data formats (JSON, Parquet, Avro) and parsing techniques.
•Experience with API integration, RESTful services, and authentication protocols.
•DevOps familiarity: Git, CI/CD, Docker, and basic Linux scripting.
•Strong debugging and problem-solving skills for data pipeline issues.
•Attention to data quality, validation, and anomaly detection.
•Excellent communication and documentation skills for client and team collaboration.
•Agile mindset with the ability to adapt to changing requirements and priorities.
•Domain awareness and ability to understand industry-specific data structures and compliance needs.
•Familiarity with data serialization, file formats, and cloud storage (S3, ADLS, GCS).
•Experience with monitoring tools (CloudWatch, ELK, Prometheus) and testing frameworks (PyTest, Great Expectations).
•Bonus: Understanding of AI/ML concepts such as feature engineering, training vs inference data, and data leakage prevention.
Salary Range: $127,500 - $172,500 a year
#LI-AD1