Emit is the conference on event-driven, serverless architectures.
Before schooling us on functional programming, Bobby Calderwood hit us with The Big Question: to microservice, or not to microservice?
Well, he says—let's start from a basic place, something we can all agree on. Death Stars = bad. You know what he's talking about: when your architecture map (er, diagram) looks like a tangled hairy mess that meets in a gigantic black mass in the middle. He talks about a real Death Star architecture out in the wild and poses the question: how did they get here?
His opinion? An object-oriented mindset. We won't spoil the talk for you, but let's just say he has a new frame on architectures that can help yours flow more like a river delta.
It'll all make sense when you watch the video (or, read the transcript below).
The entire playlist of talks is available on our YouTube channel here: Emit Conf 2017
Thank you, and thanks to the Serverless team for inviting me to speak. This is a great event, I'm so happy to see us an industry sort of thinking and being intentional about approaching event-driven architecture. I think it's an important topic. Yeah, as was mentioned, like, I'm probably gonna troll a little in this talk, so, you know, but relax, relax, is gonna be all right. So, I'm a functional programmer, by training and by preference, I come from the Clojure community where I worked with, with Rich Hickey, the creator of Clojure to build cool stuff. And so, I'm sort of invested so there's a lot of bias in what I'm gonna say, but I think there are important lessons to be learned. A lot of the problems that Rob brought up have solutions, right? The old masters have already solved a lot of these problems and thought hard about these issues that we're now encountering in a distributed systems world. And so, we should look to the past for some of these solutions.
Disclaimers, I'm not gonna be rigorous here. I'm not gonna, like try to rigorously, you know, extend the object-oriented versus the functional programming style out to the distributed case, right? Like that takes a lot of math, that sounds really hard. So, I'm just gonna like hand wave about a lot of that stuff so... And this is more about intuition, it's more about extending principles from these things. Also, you know, relax. I'm trolling, this is funny, come on, everyone smile a little.
All right. So, microservices is, like, a big thing, it's a hot topic, has been for a couple years. We talk a lot about this at Capital One, other places where I've been. Do we adopt? A lot of people say, like, "If we're not doing microservices, all of our competition is gonna pass us by because everyone is doing microservices and they're so fast and agile and they're gonna just pass us by." And then a lot of people are, like, "No, stay away from microservices, it's, like, a dead end." Right? So, there's passionate arguments on both sides and most of the time, this is sort of adopt or abandon decision is made given a certain set of assumptions about, like what microservices are and what the architecture is shaped like, right?
And most of the time the assumptions look something like this, right? All right. This is...oh, you, you can't see my attribution, that's a shame. This is Werner Vogel's tweet from I think 2008 of what AWS looked like, in 2008. I have, you know, friends and fellow travelers from AWS here, this is, kind of, a joke. But, yeah. This is, like, the Death Star diagram of all the microservices that ran AWS at this point in its evolution. And I love AWS. I use it every day, Capital One has a partnership with AWS, love AWS. This is not an accusation towards them, but this is like, this is a disaster, right? I mean, this is a really bad. See the dark part in the middle, see, all these lines here as they converge, form this dark thing in the middle. The technical name for that is a hairball, right? When all of the lines connect to where you can't distinguish them and they just turn into, like, a black mass. This is, this is difficult, right? Reasoning about this system is difficult.
Now again, AWS is awesome and they have built this amazing thing that I use every single day. So, I mean, you can build things this way, but maybe there's a better way. This looks really complicated, AWS has engineers that are a lot smarter than me. My tiny brain can't reason about this. So, I need something a little, a little simpler.
So, the aforementioned architecture of the sort of Death Star diagram, assumes a lot about the shape of that architecture, right? It's, sort of, built in this object-oriented paradigm. You have these little services that each encapsulates data, right? You can't look at my data unless you ask me nicely, you can't change my data unless you ask me nicely, there's this sort of mutable state change also via asynchronous call, right? So, the only way to get data out is via asynchronous call, the only way to change data is mutable via synchronous call. It creates a sort of dependency web like you have in an object-oriented program and memory. The sequencing and orchestration is imperative and order matters, like, put instruction first it's gonna happen first and then the other thing happens, and it's referentially opaque, right? You don't know the state of the system as a whole and it's hard to reason about the state the system as a whole, you have to interrogate each component of it to get the references and those things can change out from underneath you. So, your inputs, you know, call this system and then call this system and then give me the answer, that'll change over time because it's referentially opaque.
Functional Programming, on the other hand, has this, sort of, different approach. Data access is not done by synchronous call, it's done by sharing, you know, a reference to an immutable data structure. So, having these shared data structures, like Clojure's reference types, for example. Everyone, all of the little parts of the system, all your threads, can interrogate those reference types and see what is the immutable value that this reference identifies at this point in time. The underlying value doesn't change, Clojure's, Approach to State, which I highly recommend you look at, clojure.org/about/date I think, there's a reference at the end. It talks about how reasoning about state is most easily and simply done by assigning an identity to different values over time, by building in time as a sort of first-class construct, it's easier to reason about state without having this mutable semantic at the bottom.
Functional programs are often organized as, sort of, a data flow graph, rather than a dependency structure. There's declarative orchestration, right,? You just set up a whole bunch of things and you can see how they are related but you don't have to, sort of, specify the order in which things happen, and you have referential transparency, right? I hand two values to a function, it will always give me the same value as the output because of referential transparency.
So here's, sort of, the contrast of these principles. So, these are the principles, like, in the programming styles, like, within a single process and memory space. And we can argue....and I encourage, come up and talk to me afterwards. Object-oriented programming might be a good way to do programming within a single process space, let's talk. But, it doesn't scale well to distributed case, when you have these same assumptions that you try to, like, blow out into this microservices architecture, you're gonna have a bad time.
Distributed objects face a lot of problems and challenges, right? You end up with this, sort of, deep network of latencies, right? Someone asked me a question, so I call two friends, and they call two friends, and now, all of a sudden, you have, you know, four hops of HTTP latency and you're gonna be max, you know, the slowest path on all of those hops, is gonna be what you're bound by. So, that's bad, you're gonna make your system slow.
There's also this, sort of, temporal liveness coupling problem, where, someone talks to me and then I call two friends. If one of those friends is dead at the time that I try to talk to them, I'm dead, right? Because I'm, I'm bound to them in time. And there's you know circuit breakers, and histericks and stuff to, like, kind of, paper over that. But, like, the fundamental problem still remains.
There's a, sort of, like, pull orientation thing with distributed objects, where, I don't know the answers to the questions that I'm supposed to answer unless I asked my two friends. So, I have to like pull data out of them. You get cascading failure modes, inconsistency is possible because, you know, I get a question and then I call two friends, in between the time I call my first friend and I call my second friend, something may have happened to my second friend. That now makes me serve an inconsistent response to my to my caller, right?
The real bear here, and I think I buried the lead a little bit here with this hidden narrative bullet, this is really a problem, right? There's the famous essay about the Kingdom of Nouns and Object-Oriented programming, right? Only the nouns are first class, this happens and it really, [inaudibe 00:08:35] bad way in a microservice architecture that is object-oriented because like only the nouns are reified, right? I've got my product service, and I've got my account service, and I've got my, you know, whatever. So, you can see the nouns and you can walk up and interrogate and ask about the nouns but the verbs are lost, they're ephemeral. I make a call to someone to change something, there's no record of that thing anywhere, it's this ephemeral thing, as soon as the call is over, as soon as that, you know, socket closes, who knows whatever happened. You have to go, like, splunk through your logs or whatever, to find the fact that someone just asked that question, or someone just made that action. So, the narrative is lost, it's literally lost data, and it's also just complex. How do we reason about the state of the system at any given point in time? It's really hard to do so in this distributed objects kind of analogy for designing our systems.
So, maybe a better analogy is a river delta. This is a beautiful picture, I found this and I couldn't not put it in. This is, again, I've lost my attribution here, but this is a river delta in Siberia. So, the frozenness, kind of, makes for some of these colors, I guess. This is also complicated, right? Just like the Death Star was complicated, but there's, sort of, a fractal beauty, and symmetry, and order here that makes it easy for a human being to reason about, right? We know how this works, water flows that way to the ocean, right? And so, we can see the priority and the narrative built into this structure and reason about it just by walking up to it. You can't do that with the Death Star, right? In this, we have a notion of priority, right? The big river flows and it's the source and then it ramifies into this beautiful fractal structure and each one of these little fractal bits does its part getting the water to the ocean. Right? So, it's easy for us to reason about it, even though it's still a very complicated thing, it is complicated but not complex.
So, Object Oriented Programming is to Death Star as, as functional programming is to river delta in this distributed world, that's kind of the analogy. And, I believe the functional programming and the ideas and principles behind functional programming, does scale really well to a distributed case.
Let's talk about how. Now we have a little latency. We'll talk about this a little bit in the future. We have little latency at both read and write time. With this eventual consistency thing in the middle, right? There's some intervening processing time that, that Rob talked about, which can introduce some weirdness but we can talk about that as well. But the actual write and the actual read, can be really fast, right? The services are temporally decoupled, right? If something upstream of me is dead, I don't know or care, if something downstream of me is dead, I don't know or care. All I care about is that I can have ubiquitous access to this data log, as long as it's not dead, right? But I'm not coupled to any given service and you know I trust Kafka or Kinesis to be up, more than I trust the microservice that the knuckleheads in the office down the hall, wrote. Right?
So, you have isolated failure modes just like we talked about, you have always consistent reads. They are eventually consistent but you can reason about the point in time at which you are consistent because you have this log, this log forms this sort of clock, this logical clock of where are we in the causal history of things. So, you'll always be consistent, you won't run into this problem of, like, someone calls me and then I call two friends and in between, there was some race and something happened and now I'm serving inconsistent stuff. You don't have that because you can always, sort of, clearly reason about as of time. And you have a reified narrative, again, I think I buried the lead here. This is really important, maybe the most important thing up here.
Event-driven architectures are critically important, I think we all in this room agree or else we wouldn't have shown up here. Events sourced architectures may be even more important because, while you're being event-driven, you can still forget that history. If you're event sourced, you're maintaining the narrative of your business, right? The things that happen, the observations you make about reality, outside of, you know, the the membrane of your organization or you're bound to context, the things that your customers ask you to do. Like these are the only things at the end of the day that matter. That's the only reason we're writing these systems is to serve our customers and solve these business problems, and if you're losing the history of what your customers asked you for or what you chose to do about it, then you're you've lost something really valuable.
And as Rob talked about, we have clear reasoning about, and possibly replay of, state over time. If we get new business logic, if we have a new set of business rules, if we wanna online a new data store, we can replay our history because we've kept it. So, an example for my domain, I work for a bank, sometimes we calculate people's balances and stuff. So, you know, there was a need to maintain a customer-defined balance. So, you know, if I wanna control Timmy's spending at college, I'm gonna say, you know, only allow on this, you know, card number or whatever, his account number, only allow a certain amount of spending per week, right? Seems like a simple thing, turns out to be kind of thorny and a little bit hairy.
So, we have to aggregate these debits over time, potentially emit an event when a certain balance is exceeded, and we wanna be able to show the customer, like where they're at, right? How much money is left and allow them to, you know, tinker with their configuration, their balances and stuff.
So, this is a straw man, another disclaimer. Obviously, you wouldn't necessarily write the object-oriented, sort of, microservices this way, but this is kind of how you do object-oriented programming. I've seen people, like, for realsies, do this. In microservices, don't do this, even if you have an Object Oriented Architecture, don't do it this way. At right time, you know, we get a new transaction in, we post that, you know, the boss API here, the frontend that's gonna serve as my API contract to the outside world, calls back and like writes a thing down to the transactions microservice and then does the math and writes something back to the account balances service, because you know, we need microservices for all these different things.
So let's have different nouns. At read time, I'm gonna read and then I'm gonna, like, go fetch the transactions from one service and then fetch the balance from another service. Again, this is, like, a real problem because as this, as the transactions are rolling in, you know, there's a strong possibility for a race between these two calls at read time. Don't do it that way.
This is sort of the more functional, sort of approach, like a sequence diagram for the more functional approach, right? At write time, my only job is to write something down to this log. A thing happened. At read time, my only job is to read from the aggregate that I'm building in the account API. The domain-specific, problem-specific aggregate that, you know, resides close to my accounts API, I can just read that and it's super duper fast. In the middle, we've got this sort of intervening processing stuff, right? The balance microservice reads the transactions topic and for every transaction it gets, it computes the new balance and emits that new balance on a balances topic. The accounts API is aggregating both transactions and balances so that it can serve consistent results out to its customers, right? So this intervening processing time, you know, could take milliseconds, it could take seconds, there's some intervening time but both the writes and the reads are super fast.
So, that's, kind of, our example a lot of the stuff that I'm talking about now, the example included, are gonna be coming out as a blog post in the Confluent blog and the Capital One DevExchange blog here in the next couple days. So, you can, kind of, look at the code listing and stuff for the examples and so forth. Turns out, the actual problem is considerably harder because we have different data sources, there's the settled transactions stream, and a real-time, like authorization stream and authorizations and transactions are different in the making credit card world. So, there's some joining and some windowed aggregates and stuff we have to do but still, the entire, like code listing, fits comfortably within a blog post. I mean, it's really still a small thing. We're using Kafka and Kafka streams, which we'll talk about in the tools section, to do that aggregation, and it's really powerful and cool. With a very small amount of code, we can do this really rich, very interesting processing.
So, let's talk about, kind of, some of the techniques, the rules, and tools that we've developed to do this functional view of microservices. One possible architecture that sort of satisfies this functional view, is what I call the command architecture. I presented this at Strange Loop in 2016 and at the time, we also open sourced...Capital One open sourced some, like, a reference implementation of the thing in the upper left, the right handling component. So, this is the command architecture and basically, this is made of a few different, sort of, techniques here. The techniques behind this, REST because, at the end of the day, REST is a really nice way, at the edge, to communicate with, you know, customers, callers, consumers. So, I'm not throwing away REST, I like REST, I think it works, it just doesn't need to be as pervasive throughout the whole stack as currently, we have it. CQRS, I think it was, was mentioned, I think other people are gonna talk about as well.
I didn't meet Rob until last night at dinner and it turns out we're, like, long lost kindred spirits. I was gonna stand up at the beginning of this talk and just be, like, "Yes, dido, for banking, what Rob said." But CQRS is simply the splitting of the writes from the reads, we've already kind of, seen that the architecture here, it's Command Query Responsibility Segregation. So, splitting the writes and the read paths in your system's. Event sourcing means, I mean, you know, storing state by storing each event that happens, and then you can always synthesize aggregate state from that, right? Just like in your test analogy, storing the moves, instead of the current board state, there's a world of difference between those two things.
Pub/Sub as a mechanism for conveying these events around to your different components the need to read them. So, storage and convenience are both important I use Kafka because it does both, I like it but there are other things that do that as well. Sagas, which we'll talk about here in a minute and serverless, which is why we're all here. The rules, capture all observations and changes at the edge. We'll dig into that a little bit, to an immutable event stream. Reactively calculate the drives stream of state, aggregate that state wherever it becomes useful to you, and however, in whatever form for whatever data access pattern you need. And then manage the outgoing reactions, the side effects in functional programming speak really carefully.
So, let's dig into each of these rules here. The single writer principle, Ben Stopford has written a bunch of blog posts for Confluent, and they're great, you should go read them. He's a fellow traveler with Martin Clement, who I also very much admire. He talks about a single writer principle where you have to be really careful about writes. Right? You wanna have many readers, you want reads to fan out as much as is necessary but writes, you need to handle really carefully because this has like cascading effects downstream. You wanna be careful about how you capture these things.
So, very few authorized teams capture raw observations about reality at the edge of your bounded context. These can include, raw events, you know, things that I observe about reality, or they can include requests for action by customers. Those are called commands or request for action. As was mentioned, events are, sort of, in the past tense, these are things that we are incorporating into our view of truth, commands or speculative, untrusted, they're phrased in the imperative like updated account. You know, sign up customer or whatever. These are things that the customer wants us to do but we've got to think about it kind of hard and make some decisions before we're willing to incorporate it into our view of truth.
All right. You wanna do this at the edge of rebounding context with minimal processing, right? You wanna just get the truth of the business event. These are rich composite events, these are not domain or entity level like data tinkering things. That all happens later downstream and I don't care about as much. What you wanna capture at the edge are, rich, composite events that any one of your business stakeholders could walk up and be like, "Oh, yeah. I see what's happening, he created an account, and then he deposited some money and then he withdraws some money." Right? I mean, that's the level of business event you wanna capture, sort of, at the edge. And you wanna store it immutably and durably. You don't wanna throw these things away, ever, I think. Kafka has this really neat ability to just, like set a topic expiration to never. You just keep everything.
Causally related events go on the same log, which is a topic partition in Kafka. We talked about some different approaches to how to do that in Rob's talk. You only wanna have one writer per blog, and no change gets into the bounded of context via any other means. In the architecture, you see that here. That's the job of this commander component in the upper left, it's the writer. It's the thing that writes down stuff to these, sort of, canonical streams. These commands and events are like written in pen. These things are, like permanent and durable. Downstream of that, you'll have, you know, a few authorized teams that are gonna process those raw events and do some, like audited calculations. In my domain, there's only one way that you can calculate interest, and it's audited by, you know, lots of government people. So, you have to do that carefully, and then, you know, you wanna compute the state of different entities and maybe those two streams too, great. This can be recursive so you might end up, you know, as you're aggregating and computing the state are different things, emit further events as necessary, and to emit the event, you're probably gonna talk back to that web service that is the commander that handles the writes to that log.
And we have that here in the command processing world, in the processing jobs here, I like Kafka Streams for this, we'll talk about that in the technique section or in the tools section, excuse me. And then you wanna aggregate state. And this, lots of teams can do, right? This is where you have huge degrees of freedom in your organization, like when you're trying to beat Conway's law here, lots of teams, without having to coordinate or even know about each other, can consume the streams to build whatever aggregate matters to them. Whatever data access pattern they need, so almost certainly you're gonna have, you know, someone building like, the transactional view, the OLTP, you know, here's the state of things, view. But, in our domain, we have auditors that have to look at the stuff, we have, you know, security and operations people who wanna see what's going on, we have analytics people who wanna build like pretty pictures for the boss to show the state of the business, right? And all of these things can be done without any coordination between those teams, so the teams that are expert in building that particular view for that particular domain audience, can build it without having to, you know, ask nicely for someone to ETL out of your database and build something, right? You don't have to do that, because everyone is, has first-class citizenship, right? The primary application team doesn't have a first-class status and then everyone else has second-class status with regard to data access. Everyone's singing from the same sheet of music here, we're all speaking the same language of business domain events.
Manage side effects carefully, I wish I had more time to dig into this, but it looks like I'm about to run out of time. Side effects, so if events and commands are written in pen, the authorized computations and entity state is written in, like, erasable pen. Aggregates are written in pencil, you can throw away the aggregates and regenerate them from the log, right? So, your aggregates, you can experiment with a little bit more. Side effects are written in pen, stuck in an envelope, and mailed somewhere. Like, you can never change that. You can't even really see it, right? So, when you're causing effects outside of maintaining state on these different logs, that's what I call a side effect, right? And in functional programming, you have to avoid side effects, you have to use a monad or something if you're in Haskell. So, you shouldn't do side effects which are you know, causing action outside of the call stack, right?
Here, I'm, sort of, defining the call stack as writing to and reading from the log, so that's not side-effecting when you're just, sort of, maintaining state via this log. When you, like, call out to some third-party web service, or when you send an email to your customer, or when you set an SMS message, that's a side effect and you have to manage that carefully. Sagas are a good technique for doing that, writing down the fact that, "Hey, I tried to call this thing, here was the results, maybe you can have some sort of retry pattern." If at some point you need to like reverse that action, you can't reverse the sending of an email, you just have to send a compensate email, like, "I'm sorry, we sent an email on accident." So you have to manage that carefully.
Tools, like I said, I really like the Kafka Stack very much for this, but there are other tools that do this, right? We talked about Kinesis, there some new log, like, Apache Pulsar and Twitter's distributed log and they all have the same characteristics of being immutable, append-only stores, that then convey those changes out to listeners. And there's good integrations with serverless techniques and tools Apache, OpenWhisk, Kafka package, is one example. Here's some references, I mentioned throughout the course of the talk. The one that I don't have up there is the commander presentation that I gave at Strange Loop and the open source associated with it, you can Google that find it. That's all I have today, thank you very much.
Andrea leads growth marketing at Serverless.
operations-and-observability - 26.02.18
The current best tools for serverless observability: benefits, drawbacks, and which are right for you.
written by Andrea Passwater
Serverless lead front-end engineer Nik Graf demonstrates how to build a Serverless REST API with Lambda and DynamoDB.
written by Nik Graf