We are looking for a Senior Software Engineer (Data Acquisition) who can lead the development of reliable, scalable systems for sourcing and ingesting high-quality data that powers our core products and research. This role is foundational to how we train, evaluate, and evolve intelligent systems—fueling every insight with precision data.
What You’ll Do
- Architect and implement systems to collect, clean, transform, and store large-scale datasets from diverse sources.
- Design resilient pipelines for real-time and batch data acquisition using modern data engineering tools.
- Integrate data from APIs, web scraping frameworks, databases, third-party platforms, and internal systems.
- Ensure quality, completeness, and compliance across the entire data lifecycle.
- Collaborate with ML, product, and infrastructure teams to align on data needs and priorities.
- Optimize for performance, reliability, and cost efficiency in distributed environments.
- Monitor data flows and build automated alerting and remediation systems.
Skills We’re Looking For
- Advanced proficiency in Python, with experience in tools like Airflow, Spark, or similar data frameworks.
- Deep knowledge of data structures, systems design, and distributed computing.
- Experience working with APIs, web scraping, and handling unstructured data.
- Proficiency in SQL and working with cloud-based storage/data platforms (e.g., AWS S3, BigQuery, Snowflake).
- Familiarity with containerization (Docker, Kubernetes) and CI/CD workflows.
- Strong debugging, optimization, and logging practices for production-grade pipelines.
- Ability to work cross-functionally in a high-performance, fast-paced environment.
We’d Love It If You’ve Had
- Experience with real-time streaming data systems (Kafka, Kinesis, or Pub/Sub).