Rust's journey at iFood

What was our experience adopting Rust in our teams and what were the results?

Vanessa Gomes

💻 Logistics Delivery | Software Engineer at iFood

In this text we will share our experience adopting Rust in our teams. We will talk about the reasons for choosing this language, what the adoption process was like, as well as sharing our results.

iFood today is divided between several large areas, including ConsumersRestaurants and Logistics. Among the responsibilities of the Logistics tribe are integrating with orders made and accepted by restaurants, creating routes, offering them to delivery partners and then managing the life cycle of these deliveries.

Last semester we faced some challenges here. Some Logistics integration applications that talk to other tribes were closely coupled to the iFood ordering domain — we can simplify the ordering domain with the “life cycle of that snack you ordered”. This life cycle involves a restaurant confirming that it can prepare your snack, the preparation time, whether the sandwich is ready until it reaches the end consumer.

In short, one of the flows carried out in the Logistics tribe consists of integrating “having a snack ready” and having a route for a delivery partner who will take this snack to the person who placed the order. We then realized that our Logistics integration was technically more coupled than necessary with the “order”. This is how we understood that we would need to redesign our integration architecture.

During this analysis, we also realized that horizontal scaling alone was not making our services perform better. Vertical scaling — increasing resources like memory and CPU usage — was also becoming necessary.

It was then that we understood that in addition to redesigning the architecture, we would need to write new services and begin the retirement of legacy services.

It was at that moment that we evaluated the performance of one of our integration applications. This application is responsible for several flows in addition to iFood order It is delivery lifecycle. There are other business flows within this service — therefore, it leaves aside the concept of single responsibility.

This is a service written in Java that communicates with other applications in a mostly asynchronous way, via queues and topics. The processing time for an event that we consider as “response time” is the time to consume and finish everything that event should trigger — whether saving in some type of storage system or publishing in another communication channel.

Some numbers for this application during the highest usage peaks:

  • Response time for asynchronous events around 100ms.
  • Flow of around 80 thousand events per minute.
  • 25 instances in the air, using 1.5 GB of RAM and around 1 CPU unit (abstracting this data, consider that we have instances in production with the same size and the same processing power supply).
With a simple calculation, we understand that an instance of this service in production can, on average, process around 3000 events per minute with each instance using 1 CPU unit and 1.5 GB of RAM.

Choosing a new language

Choose the tech stack of a service can be a complicated task with many biases — and still lead to many possible results. There are generally no obvious reasons that are generally agreed upon by the technology community. Our choice needed to be a set of trade-offs which would eventually be compensated among themselves.

In general, the choice of a language will greatly influence the design of the code as well as its level of abstraction. Language can also have a huge impact on the complexity of a code base depending on the team's familiarity with that choice.

Given the mission of building resilient applications that make efficient use of their resources, we defined the following objectives to be achieved by choosing tech stack:

  • Resilience: minimize the number of incidents that occur in production environments. Make problems easy to detect so that the team can focus on developing and improving the platform. The choice of technology can be crucial depending on how the language offers fault tolerance and monitoring features.

  • Team productivity: Speed in the short term can often come at the expense of maintenance in the long term. This objective concerns the development team's ability to deliver new features quickly and last in the long term, considering the amount of work generally spent on understanding and changing old code.

  • Learning curve: every complex environment requires a learning curve and choosing a language would be no different. This impacts both syntax learning and the surrounding ecosystem.
Predicting the technology learning curve is extremely complex but crucial for choosing a new language.
  • Efficiency in the use of resources: an environment that facilitates the necessary use of memory and processing power, thus making horizontal scaling easier.

JVM vs Non-JVM

The JVM is one of the richest ecosystems in the industry. It supports practically any integration you can imagine with services and resources on the market. It has a multitude of dedicated tools and the languages in this ecosystem have a strong presence in the software community.

In general, “JVM languages” can easily integrate with any other tool or library also based on the JVM. In other words, even if a language does not have a certain integration, library or feature, it is likely that some language in the JVM family does.

This is a very important factor when considering team selection. In addition, iFood has a rich JVM ecosystem already built, tested and in production.

Java

The language Java maintains its position of relevance in the JVM ecosystem, being the mother of all other languages. It has a very rich environment in libraries and development tools. In its favor, Java is one of the most used languages at iFood and has many experts and customized libraries.

On the other hand, Java in general is considered a verbose language. Because it is such a vast language, it favors increased complexity in maintaining your code in production.

One more controversial consideration: Java is an object-oriented (OO) language. We do not consider OO a problem, however we understand that many concepts and best practices in code production within this paradigm are debatable in the Software community. There are best practices, but they are not a consensus. In other words, much of the best that can be extracted from the ecosystem is more dependent on the discipline of the development team and less on the security that the language offers.

Some examples are: silent and unexpected runtime errors, references to null values and concurrency checked only at runtime.

With this understanding, the biggest reason for not writing our new services in Java was to find another language that offers more security when dealing with errors, more efficient use of memory and that can favor the maintenance of the code in the long term.

Kotlin

Kotlin appears as an alternative within the JVM ecosystem. It has code that we consider cleaner than Java, but still with object-oriented concepts and the same conclusions related to Java.

On the other hand, the Kotlin ecosystem presents some interesting alternatives for immutability and functional aspects in the language — such as framework Arrow.

Scala

It is a JVM language aimed at Functional Programming. It is not widely used on iFood and as there are other alternatives that offer the same set of features, we discarded this option.

Clojure

Clojure is also based on the JVM and is a dialect of LISP. Today, it represents a specific niche of highly expressive languages — where developers take advantage of rapid development without the need to write a lot of code.

Has immutability by default, little code considered boilerplate and is virtually syntax-free. However, we consider the language “exotic”. We assessed that the learning curve would be greater and would present a huge paradigm shift for the development team.

We considered that it would not be a good choice because it was the first use of iFood and we didn't know if we could invest so much time building an ecosystem around the language.

Go & Rust

Initially, we analyzed these two languages side by side because they both share similar characteristics and functionalities:

  • Both are native languages, that is, they do not run in a virtual machine like the JVM.
  • They are not object oriented.
  • These are languages considered to be high performance.

Go 

Go it is a highly opinionated language: it defines very well how programs should be written and does not give much room to deviate from the designed behavior. Furthermore, one of the most important values is the simplicity of the syntax. Therefore, Go favors a smoother learning curve with little in the way so that the development team can quickly learn the main features and start building their applications.

On the other hand, Go doesn't have a very large set of features, which can lead to the program not being very expressive. Some aspects that our team considered important and that are left out of Go are:

  • Standard support for immutability.
  • Generics It is iterators.
  • Support for declarative programming.
  • Extensive error handling.
  • Computational model depends heavily on basic components (such as loops for It is ifs) for more complex iteration operations — such as Maps and Filters.

Despite this, Go has a well-developed ecosystem, had already been adopted in some cases within iFood and has an active public community.

Rust

Rust presents several concepts from other high-level functional programming languages, maintaining a syntax familiar to the world of C. The result of this was a very expressive language that has a very complete type system — which can leverage a lower number of errors in production.

Among the disadvantages of Rust, some features that we could miss:

The ecosystem is not very mature. Although the “basics” for building backend applications are available, not all of the most popular tools have available or user-friendly SDKs.
The learning curve appeared to be steeper. We got this impression mainly due to the set of “standard” multi-paradigm functionalities.

Why Rust?

Among the positive aspects of the language, we had a very extensive native set of features presented and which we see as its advantages:

  • Good semantics for error handling.
  • Strong and complete type system, as well as a strong type inference system — which ended up bringing it closer to other functional languages.
  • Native functionalities to deal with iterators with a more functional approach.
  • Safe and efficient memory usage. There is no need for a garbage collector (garbage collector). In Rust, always after executing a scope, when data is no longer used it is deallocated and that space is freed up.
  • Smart and user-friendly compiler. The error messages are very descriptive and the compiler already checks aspects that in other languages are only checked during run time: race condition problems, access to data that is no longer available in memory, incorrect type inference or errors that are not captured and cause panic situations.
  • Safe competition. The language does not explicitly allow multiple scopes to manipulate the same data in memory. To make this happen, Rust offers a pointer system that prevents access to invalid data. If this occurs, the code may not compile, forcing the program developer to deal with the possibility of the error explicitly.
  • Immutability by default. The program must explicitly declare mutable variables and manage the lifecycle of that data.

Extensive and user-friendly documentation: Rust has some of the best documentation among the languages we analyzed.

Given these aspects and even assuming that the learning curve would not be so easy at the beginning, the team decided to choose Rust. It seemed like a good price to pay given the results we could achieve with these new services.

Finally, another team at iFood had recently developed an application in Rust. In other words, there were other people on our boat too.

Adopting Rust

When we decided to adopt Rust as the main language for developing new applications within our domain, there was no person on the team with solid experience in the language. The team we initially formed was very mature, being made up of people with a background very diverse in terms of programming language — Java mainly, but also Clojure, Haskell, F# and Javascript.

As some people had a little familiarity with the language, our first tasks ended up naturally gravitating towards them while the other team members, with more time at work, were divided into maintenance tasks for legacy systems, in addition to planning the new architecture.

To prevent this from becoming a recurring pattern and ensure that this knowledge could be transferred evenly across the team, we decided that we would prioritize working in pairs whenever possible, as new Rust demands emerged.

When practicing pairing in a context similar to ours, it is common for an inexperienced person to try to guide another — which generally makes pairing sessions very tiring and, ultimately, leads to abandonment of the practice. To prevent this from happening, we tried to pair people further along in their learning journey with those who were still in their first contact. Another important factor was to make our approach more flexible as the team matured, so that working in pairs or groups could occur more or less punctually as people gained more independence. We therefore chose not to make this practice mandatory and allow the team to have the autonomy to define their best routine for development.

In addition to pairing and group scheduling, it is an established practice in our teams to dedicate one shift a week exclusively for personal development, where work on task board activities can be suspended and meetings are optional. Each person also has a budget dedicated to self-development, which they can use by paying for courses, books and, when possible, trips to conferences and workshops.

We have some channels dedicated to talking about Rust on iFood, but in addition we have a weekly meeting where people on the team exchange experiences, share learnings and ask each other questions. As we are a remote 100% team, moments like this are important to maintain our bond as a team, in addition to being stimulating to be able to share what we are learning with colleagues.

Challenges

Still talking about learning, understanding the language memory model was our first big challenge. Although Rust does not formally have a memory model, the memory mechanism loan checking of the compiler forces us to constantly think about the lifetime and usage of each variable we declare.

Topics like monomorphization, dynamic dispatch, the differences between heap and stack, etc. they were a little distant from our reality and therefore it was necessary to make an intentional effort in our studies to not only understand the rationale behind the compiler's suggestions, but also to return to programming at the speed we were used to.

Another challenge was the lack of infrastructure tools. In order for our applications to comply with the quality standards established by iFood engineering, we had to develop some customized solutions. For example:

  • Logs: There is a certain variety of logging tools already developed by the community, but in general they are very different from the style we are used to with the JVM. We didn't find anything like Logback that could be configured with XML at the root of the application. Today we are very satisfied with the crate tracing, but an extra layer of code developed by us was necessary so that everything was in the format we wanted without it contaminating the architecture of the applications.

  • Monitoring: New Relic is one of the main tools we use to monitor our applications. Unfortunately they don't offer any official client for Rust and so we had to adopt a community-developed solution — bindings on top of the official C SDK. We faced some stability issues and other usability issues and that's why we decided to maintain a fork internally to meet the specificities of our demand. We are evolving our own instrumentation layer built on top of the crate tracing which until now has been serving us well, but in any case it required investment from the team.

  • Kafka: very similar to what happened with monitoring tools, we also need to develop our own solution, based on libraries open source, to provide asynchronous communication capabilities to applications. Today we have a library with our own implementations for Kafka producers and consumers compatible with Avro and JSON.

A more recent problem, which does not seem to be specific to our experience at iFood, is the incompatibility between asynchronous runtimes. The implementation we adopted, Tokyo, is not compatible — in some of its versions, especially those prior to 1.0.0 — with some of the asynchronous execution features provided by the language. This forces us to ensure that our dependencies that also depend on Tokio are all under a compatible version set. As today we have a strong dependence on Actix, one framework For the development of Web applications that depend on Tokio pre 1.0.0, we still need this juggling act to guarantee compatibility between runtimes. It's a problem that we're in the process of solving, but in any case it was something we couldn't predict when we adopted the language and it gave us some work.

Finally, we have integration with AWS. Our applications need to interact with a series of cloud services so that they can perform their role within our domain: bridging the gap between the origins of an order and its destination through the delivery domain.

You can imagine that our cloud infrastructure plays a very important role in making this happen. That said, it's important to note that AWS does not provide an official SDK for Rust. Yes there is one official project under development but it is still in alpha, so no release date is expected in the near future. The alternative we found was crate rustoto, which despite not being developed by AWS itself, is the result of the incredible work of very dedicated and capable people within the open source community. It has served us very well, but there is always a risk involved in being an unofficial product: functional compatibility and dedicated support being some of the most relevant. This is a point of attention if your organization depends on something very specific to AWS or if technical support is a contractual requirement that cannot be waived.

Results So Far

Despite all the points mentioned in the challenges section, we are quite satisfied with our decision to adopt Rust. What caught our attention — right away — and gave us the feeling that we were heading in the right direction was how well the applications behaved in the first load test we carried out.

When we started development, it was not our priority to optimize transactions for the best use of computing resources, or to achieve the best response time. First, we wanted to deliver the MVP. It was then that we came across these results:

Our services present much higher performance and with resource consumption much lower than current levels.
95% of our transactions are processed under 40ms
Flow of around 30 thousand events per minute with 4 instances in the air

With all this, Rust proved to be the right choice when we achieved such good results. The quality of applications is also not just reflected in performance. Despite the high volume of transactions, our applications have behaved very well and with a low occurrence of incidents, which when they arise tend to focus on infrastructure problems or bugs in poorly modeled contracts — which is likely to occur in practically any language .

In addition to the numbers, today we have a team with a feeling of accomplishment for delivering highly efficient services. One thing that stimulates us a lot, too, is when we are faced with situations outliers:

Processing an event that lasted less than 1ms

Each achievement and result we see makes us happy and motivated to achieve quick results while learning a new technology and delivering value to iFood.

Future

The Rust adoption journey at iFood is still in its infancy. Our plans are ambitious and to achieve the objectives we have set we still have a lot to build.

Our increasing demands only reinforce the need to expand our teams even further and continue investing in our processes. onboarding. Improving our work dynamics is a constant learning process and something we take very seriously.

With the scale of applications, new requirements emerge for our infrastructure libraries that are not yet at the level of maturity we desire. One of our medium-term objectives is to reinforce investment in tooling to surpass productivity levels compared to other languages established in iFood.

We also want to have more participation in the Brazilian Rust community and expand our impact beyond iFood. Our intention is to encourage the exchange of experience with others players of the market and encourage the adoption of the language.

Was this content useful to you?
YesNo

Related posts