Job description:
The Company works in the Alzheimer’s space. We are building AI models that seek to uncover clues on early detection and to predict the course of the disease based on patient signatures.
We are seeking individuals who can work with the AI Engineers to harmonize extensive databases that can be used in AI models. The Data Engineer will work closely with the AI team and harmonize longitudinal study databases, ensuring that the database is error-free and consistent with other databases used in the AI models.
Responsibilities:
- Extract variables of interest from multiple databases.
- Harmonize variables across databases.
- Ensure the correctness of data ranges based on data dictionaries.
- Process and encode databases for AI algorithms.
- Generate story-like prompts from harmonized datasets for Large Language Models (LLMs).
Requirements:
- Knowledge in data handling using Python and common libraries in the data science industry.
- Fluency in pandas is required, along with knowledge of other libraries and/or techniques for exploratory analysis and data profiling.
- Statistical foundation to understand and report on the variables within a dataset.
- Skilled at generating visualizations that effectively communicate the findings of the data flow in a project.
- Skilled at working on projects with high-dimensional data and/or clinical data is a plus.
- Proficiency in English is required.
Qualifications:
- Pursuing a BA/BS/MS/MA/MD in any Science, Technology, Engineering, and Mathematics (STEM), Systems Engineer, Social Scientist or other.
Languages:
- B2 English level or higher.
Soft Skills:
- Independence
- Teamwork.
- Proactivity
- Attention to detail
Tipo de puesto: Prácticas
Duración del contrato: 4 meses