Job Description:
PySpark Lead with 7+ years of experience, to drive the
architecture, design development and optimization of big data processing
solutions in the Asset and Wealth Management Industry Segment. The
ideal candidate will have deep expertise in Apache Spark, Python, and
distributed data processing while leading a project team of data
analysts, business analysts/testers, domain consultants and engineers to
deliver scalable data solutions.
•Lead the end to end lifecycle covering architecture design, development, and deployment PySpark-based big data solutions.
•Collaborate with and manage client project leadership, and key
workstream leads to drive the solutions, strategize and navigate
delivery and manage delivery outcomes
•Architect and optimize ETL pipelines for structured and unstructured data.
•Collaborate with Client, data engineers, data scientists, and
business teams to understand requirements and provide scalable
solutions.
•Optimize Spark performance through partitioning, caching, and tuning.
•Implement best practices in data engineering (CI/CD, version control, unit testing).
•Work with cloud platforms like AWS, Azure
•Ensure data security, governance, and compliance.
•Mentor junior team members and developers and review code for best practices and efficiency.
Qualifications:
• 7+ years of experience in big data and distributed computing.
• Strong hands-on experience with PySpark, Apache Spark, and Python.
• Experience with SQL and NoSQL databases (DB2, PostgreSQL, Snowflake, etc.).
• Proficiency in data modeling and ETL workflows.
• Familiarity with workflow schedulers like Airflow
• Experience with cloud-based data platforms (AWS, Azure, GCP).
• Knowledge of DevOps, CI/CD pipelines, and containerization (Docker, Kubernetes) is a plus.
• Strong problem-solving skills and ability to lead a team.
• Industry experience in Wealth and Asset Management
#LI-KR2