Airflow 2.0 is a big thing as it implements many new features. Like the high available scheduler or overall improvements in scheduling performance, some of them are real deal-breakers. But apart from deep, core-related features, Airflow 2.0 comes with new ways of defining Airflow DAGs. Let’s take a look at what’s been improved!
The first significant change is the introduction of TaskFlow API. This new thing consists of three features:
@task
decorator which makes using PythonOperator
smooth and easy,Apache Airflow is already a commonly used tool for scheduling data pipelines. But the upcoming Airflow 2.0 is going to be a bigger thing as it implements many new features.
Today, most tech companies process more data per second than ever. Such amount of data is not only a result of the number of served users, but it’s also related to the tendency to “measure” everything so we can make better decisions in the future (for example, recommendations). To make such data-intensive systems successful and reliable, we have to make them scalable. In this short series of articles, I want to discuss what application scalability means and how it can be achieved on Kubernetes clusters.
Probably everyone grasps the idea of scalability. It simply means that we want to make…
It is quite easy to jump to statements like “it’s up to developers to learn how to open source.” But such remarks also mean that we need no schools, right? If we expect that everyone will learn on their own, then there’s no reason to have schools. Of course, everyone in the right mind knows that this is a fool’s statement. In this article, we share a few lessons that should help both maintainers and new contributors.
We think that maintainers should support contributors and educate them about how open source works. …
In the first part and the second part or the series, we’ve learned about types of scaling and what we get out of the box if we use Kubernetes to deploy our apps. When discussing Kubernetes horizontal autoscaling capabilities, I’ve mentioned that HPA (Horizontal Pod Autoscaler) has two drawbacks: a limited number of metrics that users can use to perform autoscaling, HPA cannot scale a deployment to 0 pods.
In this part, I want to take a closer look at two interesting projects that may help you tackle those issues: Knative and KEDA.
Knative is a platform to deploy and…
These days, for many, Kubernetes is nearly a synonym of cloud-native. In many cases, this means abandoning the on-premise infrastructure to gain all advantages of infrastructure-as-a-service. One of the most interesting advantages of such an approach is the ease of scalability and accompanying cost optimization. In the previous article, I covered the basic concepts of application scaling. This time, we will dive deep into Kubernetes’ autoscaling capabilities with a special focus on the horizontal type.
In the case of Kubernetes autoscaler, when we say autoscaling, we generally think about scaling pods. Kubernetes clusters ship with built-in mechanisms for automated vertical…
The open-source contributors live in basements, they avoid the sun and are afraid to talk to people. And when they speak, they are opinionated and are far from being conciliatory. Well, if that was true, open source would have died a long time ago. The story I want to share shows how open-source communities are far from this stereotype. It began 9 months ago when I had started to work on Apache Airflow more than full-time. During this time, I’ve learned a lot, and here are the lessons.
Those familiar with Apache Software Foundation probably know the slogan. For those…
Apache Airflow is an open-source tool for creating and managing complex workflows. More recently, Airflow has been gaining a lot of traction and popularity among data scientists for managing machine learning workflows. With increasing usage, we can see user expectations increase too. Like all users, we also expect tools to be reliable, scalable and work out of the box. The Apache Airflow community is working on improving all of these aspects. The recent joint effort of Databand and Polidea has brought many performance improvements to the core of Airflow.
Airflow is a big project with many features and moving parts…
In January, together with a group of four friends, we participated in Campus App Challenge — 24h hackathon organized by Indoorway at our university. Our task was to create an app that uses Indoorway’s indoor positioning system that tackles a real problem our alma mater struggles with. We decided to use our programming skills and energy to help our peers, who are misrepresented at the university. These are the visually-impaired.
Three months later, as we’re still developing the app, we’d like to share with you our ongoing journey to building Indoor Available app, along with some tips for anyone creating…
You have little or no control over your life. Otherwise you will not read about maintaining habits. Habits — we urge to build or destroy them. Good habits like regular running, keeping a diet or reading books are often identified with highly effective people. Regular smoking mixed with unconscious scrolling of Facebook’s newsfeed and eating tons of sweets are usually considered to be bad habits. It does not mean that people who spend half on their life on social media are ineffective. Maybe they are so effective that they have a great amount of free time. …
Opportunity seeker, software engineer, open source enthusiast.