// Freelancer

Kyra Lux

Data engineer with a bioinformatics background.

Remote · Project-based · Europe-wide

Available

// skills

What I've worked with

Engineering

Worked across multiple languages depending on the context: Python for data and research work, Kotlin and Java for backend services and APIs.

Data pipelines & visualization

Extracting data from various sources including APIs, databases, web scraping, and parsing unstructured text files. Transforming, cleaning, and combining it into new datasets or specific formats. Visualizing results through matplotlib, Jupyter notebooks, or dashboards in Grafana and Datadog.

Machine learning

Training various models and classifiers with scikit-learn and TensorFlow while making sure to follow proper procedures for validation and train/test splits. Running experiments on feature engineering and selection, and comparing multiple approaches against each other.

DevOps

Setting up automated pipelines for testing and deployment with GitHub Actions and Jenkins. Experience covers containerized deployments with Docker and Kubernetes, and monitoring running systems including setting up alerts and responding to incidents.

Bioinformatics & research

Worked on research projects involving genetic sequence data, enzyme kinetics, and microbial phenotypes. Experience includes building phylogenetic trees, applying machine learning to biological data, and extracting structured biological data from messy sources.

APIs

Built and consumed REST APIs in different contexts, microservices backends in Kotlin/Java and data integrations in Python. Familiar with REST principles, OpenAPI/Swagger documentation, and common authentication patterns like bearer tokens.

Also familiar with: Git · Docker · SQL · Linux · Kubernetes · Microservices · Kotlin · Java · Google Cloud Platform (GCP) · Flutter · AI-assisted development

// experience

What I've worked on

Research
Bacterial phenotype data pipeline Research Assistant · Helmholtz Institute for Infection Research
Python bacterial phenotype data data engineering data pipeline data scraping data cleaning

Built a pipeline to extract structured bacterial phenotype data from Bergey's Manual of Systematic Bacteriology. The source material was messy and inconsistently formatted across volumes. The resulting dataset extended the validation of Traitar, an open-source ML tool for predicting microbial traits from genome sequences, by providing phenotype annotations for 296 additional sequenced bacterial species. Published as second author (papers appear under my maiden name). paper

Also contributed to two additional published studies during this period, primarily visualization work and analysis tasks as a research assistant. paper 1 paper 2

Substrate specificity prediction Master's thesis · University of Düsseldorf
Python machine learning model comparison scikit-learn TensorFlow

Built a Python program to train and systematically compare multiple ML algorithms for predicting substrate specificity from protein sequence data. Used scikit-learn and TensorFlow to implement random forests, SVMs, recurrent neural networks, Bayesian classifiers and linear regression, and analyzed the results across all approaches.

Enzyme kinetic constant prediction PhD candidate · University of Düsseldorf
Python machine learning feature engineering feature selection

Adapted the ML pipeline I built for my thesis to predict kinetic constants from protein data, using random forests as the classifier. Focused on feature selection and feature engineering to improve predictions, supplementing the dataset with additional features from protein databases. This was during a PhD candidacy; I left after 1.5 years of independent research when the topic turned out not to be the right fit.

Industry
Customer master data platform Full-Stack Engineer · Metro Digital · 4.5 years
data mapping data sync data migration views microservices Kotlin/Java GCP CI/CD

Built a new customer master data system with a redesigned data model, onboarding countries one by one while keeping old and new systems in sync at all times. The work included data mapping for country-specific edge cases, handling Disaster Recovery Plans, and building pre-computed views. Spearheaded the GCP migration by evaluating technologies, setting up test environments and reworking the deployment pipelines. Monitoring and alerting in Datadog.

// solo projects

What I built independently

data pipeline & web tool Visit project

Kennzeichenkarte.de

Python · HTML/CSS/JS · open source data · geodata

An interactive map of all German licence plate districts. It is powered by a data pipeline written in Python that takes raw license plate data from Wikipedia, enriches it with geolocation data from OpenStreetMap, and simplifies the boundary files to keep them small enough to serve to visitors. Live, with daily users.

Messy source data

Taking the license plate data from Wikipedia meant it came with a lot of inconsistencies: different formatting for entries with multiple areas, mixed use of area prefixes, and footnotes or remarks scattered throughout that needed to be cleaned out.

Resolving ambiguous lookups during geodata fetching

To find the right geographic area for each code I queried the Nominatim API. The challenge was that the same name could return multiple hits (a city and its surrounding district share a name), and the fields used to tell them apart were filled inconsistently across entries. Some codes also don't map to a place at all, like codes for nationwide services, which needed to be handled separately.

File size vs. accuracy

The raw boundary files were too large to serve to visitors. I had to simplify the GeoJSON enough to keep download sizes reasonable without making the district shapes look wrong.

Mobile app Visit project

reDream

App Development · Flutter · local-only architecture

A minimal dream journal for Android. The app stores everything locally on the phone and does not track or transfer any data. Published on the Play Store.

Debugging without any telemetry

Without any telemetry there is no visibility into what goes wrong for users. I needed a way for non-technical users to report issues without it being too complicated. The solution was internal logging in the app, an export logs feature, and a button that lets users send an email with the logs attached.

Learning app development and UI/UX from scratch

From picking up Flutter and learning UI/UX basics, over writing data privacy documentation, to navigating the Google Play release process. None of this was familiar territory before this project.

// contact

Get in touch

Looking for someone to take on a data or bioinformatics project? I'd love to hear about it.

hello@kyralux.de
© 2026 Kyra Lux
Imprint