Data Infrastructure Engineer
January 8, 2025
Open
Open
Location
Vietnam
Occupation
Full-time
Experience level
Senior
Apply

About the Job

As part of our next phase of innovation, we are building a decentralized AI dataset pool to empower users to query, visualize, and use these ready-made datasets for fine-tuning and training their AI models—all within our platform. This pool will also allow external contributors to share datasets and earn revenue based on data quality, quantity, and usage.

  • Job type: Full-time & Remote. Flexible working hours from Monday to Saturday, ensuring 44+hrs. We need A players, who can work like hell, if you prefer a 9-5 job, this might not be a suitable company for you.
  • Location: Work from anywhere.
  • Reporting line: CTO

Your Responsibilities

We are looking for a skilled Senior Data Infrastructure Engineer to lead the development of our decentralized AI dataset pool. This individual will play a critical role in organizing and structuring our current unstructured dataset stored on S3, designing user-facing tools for data interaction, and implementing a revenue-sharing system for contributors. You will have to work closely with the blockchain team to handle the storage as well as revenue-sharing mechanism for external contributors. The ideal candidate has expertise in building scalable data systems, integrating APIs with platforms, and working with AI/ML workflows. Experience in decentralized technologies such as IPFS is a plus.

Key responsibilities include:

  • Data Structuring & Management: Transform unstructured datasets stored on S3 into a structured, queryable, and accessible format. Develop efficient data pipelines and systems for data ingestion, transformation, and management.
  • Platform Integration: Build tools for users to query datasets, visualize data insights, test data quality. Enable users to fine-tune and train their models using selected datasets without downloading data, ensuring usage remains within the current AIxBlock platform.
  • Contributions & Revenue Sharing: Implement mechanisms for external contributors to submit datasets. Develop a system to calculate and distribute revenue shares based on dataset quality, quantity, and usage. In this task, you will have to work closely with the blockchain team.
  • Data Security & Compliance: Ensure robust data security, privacy, and compliance with applicable regulations. Implement access controls and audit trails for data usage.
  • Scalability & Decentralization: Design and implement decentralized storage solutions (e.g., IPFS, Arweave) to align with AIxBlock’s vision. Ensure scalability to handle large-scale datasets and user interactions.

Requirements

Technical Skills:

  • Proven experience in data engineering, particularly with unstructured data.
  • Strong expertise in AWS S3, databases, and data querying tools.
  • Proficiency in building and integrating APIs for data interaction.
  • Hands-on experience with data visualization tools Familiarity with machine learning workflows and tools.
  • Knowledge of decentralized storage solutions and blockchain technologies (preferred).

Compensation & Benefits

Compensation: Base salary: Negotiated salary depending on experience. Token bonus based on Performance

Benefits:

  • Salary review depending on the performance
  • Birthday gift
  • Holiday gift
  • Year-End Performance Bonus (Cash)
  • Year-end party

... (more benefits listed in the original text)

Application Process

Resume & Portfolio screening

Interview with the TA

Interview with the CTO

Offer discussion and contract Signing

Please note: We're all about remote work and have collaborators based all around the world, and English is our primary language. Therefore, English CV is required. The application process may be slightly modified (shortened or prolonged) when necessary.

Apply
AIxBlock.jpg
AIxBlock
AIxBlock: A comprehensive platform for quickly developing AI models with decentralized compute resources, ensuring full privacy control and cost efficiency. AIxBlock lets you productize AI models using globally underutilized computing resources while maintaining full privacy control. Instantly self-host this AI platform in minutes, enabling seamless development, fine-tuning, and deployment your AI on your own infras. 1. Self-hosting Unlike other platforms that come with hefty upfront costs and long-term commitments, AIxBlock removes those barriers, making self-hosting truly accessible. With AIxBlock, you get: ☀️ 100% control over your privacy ☀️ No commitments or upfront costs ☀️ No setup fees ☀️ Up and running in just minutes, with zero manual setup 2. Customized workflows AIxBlock adapts to how you work. Here's what you can do: 1️⃣ Train & Deploy Train your model from raw code using your data, and deploy when ready. 2️⃣ Fine-Tune & Deploy Fine-tune pre-trained models and deploy on your terms. 3️⃣ Deploy Only Deploy your models or set up API inference endpoints with ease. 4️⃣ Label & Validate Simplify data labeling and validation with our API-ready solutions. 🎥 Check out our demo video here to see it in action: https://www.youtube.com/watch?v=kWfoIjEEDRU
HQ Location
Company type
Start-up
Domain
Computer Software
Website
📨 New remote jobs in your inbox, every Monday!
Subscribe to get your 5-minute brief on tech remote jobs every Monday