We are looking for a Lead Data Engineer to join our growing team of data experts. The hire will be responsible for designing, implementing and optimizing data transformation pipeline flows serving machine learning solutions designed by data scientists within ThetaRay's system.
The ideal candidate is well versed in building data pipelines and complex data transformations, enjoys optimizing data systems and building them from the ground up.
The Lead Data Engineer will also provide daily guidance to the data engineering team, develop best practices and methodologies for data engineering within ThetaRay's system and develop automation tools enabling better efficiency.
They must be self-directed and comfortable supporting multiple production implementations for various use cases, part of which will be conducted on premise at customer locations.
Implement, optimize and maintain data pipeline flows in production within the ThetaRay system based on data scientist’s solution design.
Design and implement solution-based data flows for specific use cases, enabling applicability of implementations within the ThetaRay product.
Develop and Maintain data engineering best practice methodologies and guide the data engineering team on a daily basis.
Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
Work with product, R&D, data and analytics experts to strive for greater functionality in our systems.
Train customer data scientists and engineers to maintain and amend data pipelines within the product.
Hands on PySpark and Python knowledge and experience working with and optimizing big data’ data pipelines, architectures and data sets.
Strong analytic skills related to working with structured and semi-structured datasets.
Build processes supporting data transformation, data structures, metadata, dependency and workload management.
Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
Business oriented and able to work with external customers and cross-functional teams.
3+ years of experience in a Data Engineer role, who has attained a technical degree in Computer Science, Statistics, Informatics, Information Systems, Engineering or another quantitative field.
Experience with Linux.
Experience with implementing machine learning pipelines and basic understanding of machine learning flows.