Key Activities
- Design, build, and maintain scalable data infrastructure including data lakes, pipelines, and metadata repositories to ensure reliable and timely data delivery to stakeholders.
- Collaborate closely with data scientists to support data modeling, integrate diverse data sources, and enable machine learning workflows and experimentation environments.
- Develop and optimize large scale batch and real time data processing systems to improve operational performance and support business objectives.
- Utilize Python, Apache Airflow, and AWS services to automate data pipelines, ensuring efficient scheduling, orchestration, and monitoring of workflows.
- Manage cloud based data storage and compute resources using AWS services such as S3, Glue, EC2, and Lambda to ensure scalability, performance, and cost efficiency.
- Implement robust testing, validation, and monitoring processes to maintain the reliability, accuracy, and security of data pipelines.
- Stay up to date with emerging technologies and industry best practices in data engineering and data science, continuously identifying opportunities for optimization and innovation.
Required Skills
- Strong proficiency in Python for data processing and scripting (Pandas, PySpark) and workflow orchestration with Apache Airflow.
- Hands on experience with AWS services including Glue, S3, EC2, and Lambda for cloud based data processing.
- Experience managing containerized environments using Docker and Kubernetes.
- Practical experience with big data and columnar databases such as Athena, Redshift, Vertica, Hive, or Hadoop, along with version control systems like Git.
- Familiarity with cloud based data platforms and infrastructure management on AWS.
- Experience with CI/CD pipelines using tools such as Jenkins, CircleCI, or AWS CodePipeline.
- Strong focus on data engineering, including designing and managing large scale data architectures and pipelines.
- Ability to support data science teams through data preparation, feature engineering, and enabling experimentation environments.
Nice to Have
- Experience with LangChain or similar frameworks for building NLP or conversational AI driven data applications.
- Familiarity with machine learning platforms such as AWS SageMaker or Databricks.
- Experience with both relational databases (MySQL, PostgreSQL) and NoSQL databases (DynamoDB, Redis).
- Exposure to enterprise BI and analytics tools such as Tableau, Looker, or Power BI.
- Knowledge of distributed messaging and event streaming platforms like Kafka or RabbitMQ.
- Experience with monitoring and logging platforms such as the ELK stack or Datadog.
- Understanding of data privacy, governance, and security best practices for large scale data platforms.
HOW TO APPLY: Please send your CV to the consultant in charge:
E-mail: nhuhoa.nguyen@ev-search.com
All applications will be considered without regard to race, color, religion, sex (including pregnancy and gender identity), national origin, political affiliation, sexual orientation, marital status, disability, genetic information, age, membership in an employee organization, parental status, military service, or any other non-merit factor.

