Role: AI Data Engineer Architect
Location options: San Francisco Bay Area, New York / New Jersey, Atlanta, Chicago, and Dallas.
Preface
The AI – Data Engineering Architect in TCS’s Americas region designs and implements robust data infrastructures for enterprise AI and analytics solutions. This client-facing hybrid role ensures that organizational data assets are architected to optimize AI initiatives, supporting various industries including BFSI, Manufacturing, Life Sciences, Telecom, Retail, Travel, and Consumer Goods.
What You Would Be Doing
•Data Architecture Design: Design data flows from source systems to AI/analytics models, creating conceptual and logical architecture diagrams.
•Enterprise Data Assessment: Evaluate client data ecosystems, identify gaps, and recommend modernization strategies for AI readiness.
•Pipeline & ETL Strategy: Define ingestion and processing strategies (batch/real-time), select tools, and outline data transformation workflows.
•Data Storage & Modeling: Architect storage solutions (data lakes, warehouses, vector/graph databases) and design efficient schemas.
•Integration of Heterogeneous Data: Plan integration of structured and unstructured data, ensuring consistency and alignment across systems.
•Quality, Governance, and Security: Implement data quality checks, lineage, compliance, and security measures throughout the architecture.
•Scalability & Performance: Design scalable, high-performance solutions using distributed computing, cloud auto-scaling, and performance tuning.
•Technology Selection & Blueprint: Recommend platforms/tools and create comprehensive reference architecture blueprints.
•Collaborate with AI/Machine Learning Teams: Align data structures with model requirements and coordinate on feature engineering and retrieval pipelines.
•Prototype and Validate: Build proofs-of-concept for pipelines or retrieval solutions to validate approach and refine architecture.
•Industry-Specific Data Solutions: Tailor architectures for domain-specific data types, volumes, and compliance needs.
•Client Engagement & Thought Leadership: Lead workshops, produce documentation, and advise on data strategy and governance.
•Oversee Implementation: Govern solution realization, ensure alignment with designs, and resolve technical challenges during execution.
What Skills Are Expected
•Data Architecture Expertise: Experience designing complex data architectures with strong modeling and abstraction abilities.
•Big Data and ETL/ELT Mastery: Proficient in Hadoop, Spark, and distributed data processing, with batch vs. streaming expertise.
•Cloud Data Services: Skilled in AWS, Azure, or GCP data services for cloud-native data pipelines and cost/performance optimization.
•Databases and Storage Systems: Proficient in relational, NoSQL, and analytical databases; able to recommend storage by use case.
•Data Integration & APIs: Experienced with ETL, enterprise integration, and API-based data service design.
•Data Governance & Quality: Knowledge of data cataloging, lineage, and quality monitoring frameworks and tools.
•Security & Compliance: Understanding of encryption, access controls, and compliance standards (GDPR, HIPAA, etc.).
•Collaboration & Leadership: Strong communication, client engagement, and team leadership abilities.
•Analytical and Problem-Solving: Aptitude for troubleshooting, identifying bottlenecks, and designing mitigations.
•Domain Knowledge: General awareness of data environments in finance, manufacturing, retail, and other key industries.
•Project Management: Ability to coordinate architecture deliverables, align tasks, and manage technical workstreams.
•Emerging Tech & Trends: Up-to-date on latest data technologies, paradigms, and trends relevant to data architecture.
•Certifications (nice-to-have): Certifications in AWS, Azure, GCP, or Cloudera Data Engineering/Architecture are preferred.
Key Technology Capabilities
•Big Data Frameworks: Advanced use of Apache Spark and Hadoop for batch/stream processing and optimization.
•Data Pipeline Orchestration: Skilled in Apache Airflow, AWS Step Functions, or similar for workflow management.
•Relational Databases & SQL: Strong SQL skills; experience with Snowflake, Redshift, BigQuery, Synapse, and columnar storage.
•NoSQL and Specialized Stores: Experienced with MongoDB, Cassandra, Redis, and graph databases like Neo4j.
•Streaming & Messaging: Proficient in Kafka, RabbitMQ, AWS Kinesis, Google Pub/Sub for real-time/event ingestion.
•Search and Indexing: Familiarity with Elasticsearch/OpenSearch and vector databases for text/semantic search.
•Cloud Data Ecosystems: Proficient with key data components on AWS, Azure, or GCP, including storage, ETL, and analytics services.
•DevOps & Infrastructure as Code: Experience with Terraform, CloudFormation, Docker, Kubernetes, and CI/CD for data engineering.
•Data Modeling Tools: Competency with ERwin, UML, Jupyter, or SQL IDEs for data modeling and prototyping.
•Metadata & Catalog: Skilled in Apache Atlas, AWS Glue Data Catalog, Azure Purview, and lineage/metadata management.
•Monitoring & Logging: Use of Splunk/ELK, cloud monitors, and APM for pipeline performance and alerting.
•Machine Learning Integration: Understanding of feature stores, ML pipelines, and integration with data architectures.
•Testing & Validation: Experience with Great Expectations and scripting for automated data integrity testing.
•Workflow Management & Agile: Proficient with Confluence, JIRA, and Git for documentation and collaboration.
•High-Level Languages: Ability to write and review Python or Scala for data pipeline development and reference.
•Enterprise Systems Knowledge: Experience integrating with ERP, CRM, and mainframe data using standard methods.
Salary Range: $131,750 - $178,250 a year
#LI-AD1