TECH BANKRUPTCY
Luka Kladaric Sekura Collective luka@sekura.io @kll
A presentation at GrowIT 2018 in December 2018 in Novi Sad, Serbia by Luka Kladaric
Luka Kladaric Sekura Collective luka@sekura.io @kll
Since the dawn of software development, weʼve been faced with the same impossible choice every single day: do it quickly or do it well. We do our best to make the right choice for the task at hand, and we move on.
Then came the lean startups & “Move fast and break things” and put their thumbs on the scale in the favor of the hacks, the MVPs, the just-ship-its, and the Product Managers just ate. it. up.
Thatʼs great, for proving a concept or finding a market fit. But what happens when thatʼs all you do? When the entire organization, top to bottom, has collectively forgotten how to write quality software. When you become unable to make the correct technical decision even by accident.
We will take a deep dive on a mission-critical web application that is basically unusable on its best day, and trace the trivial bad decisions that got it there. In the end I hope you will never take a shortcut again.
small time startup, mobile app, think Foursquare/Swarm. People check into locations, leave tips and photos for each other. You charge a small fee for a no-ads experience. The data accumulates and isn't used.
THE HACKATHON 8 — @kll #GrowITconf 2018. You organize a hackathon, and someone builds a concierge dashboard. It lets you peek at individual users' data and reach out to them with suggestions over chat.
The ads revenue is drying up, nobody is buying the no-ads experience, so you pivot to selling the concierge service. You start small, a few people working on it, helping just a few clients selected for this trial.
The app is essentially a collection of hacks to pull user data that was never meant to be aggregated from the database.
Built for a dozen people, now used by hundreds. Initially each helping a dozen clients, now thousands.
Hitting limitations of your hiring pipeline, you start hiring remote. Outsourcing even. Not everyone is high-trust any more. Support becomes difficult.
"the browser can't render that many messages, but the API doesn't have pagination"
pagination in the browser is not pagination
"just give me all the data via API"
no pagination on lists that keep growing forever deep responses that keep growing in scope Imagine if viewing a tweet also gave you a full list of everyone who liked it, along with all data about them You could build a twitter clone that has a single API endpoint that returns everything, all tweets, all users, all tweets by each user, but even if only 10 of your friends used it it would become unusable within a few months.
"the backend team is busy, let's just reuse this meaningless field for meaningful signals"
sorting in the UI, based on data from deep responses
"realtime chat is difficult, let's just refresh everything every time there's a change"
incremental updates > refreshing everything pubsub as trigger for regular refetch. huge surge of messages = new refetch triggered while old one is still completing. eventually times out.
"we don't have profile image thumbnails"
just like sorting in the browser, resizing images in the browser is very inefficient eventually, you hit 800 MB pageloads with images included. resized: 12 MB. if the user list were paginated, it's be a fraction of that. unusable on slower computers, ipads, ...
"Why is everything down?"
Large responses = slow responses Slow responses + surge = timeouts We ran out of workers to service requests on the app backend
"We don't know who sent the user a message full of profanity"
because thereʼs no way to grant several users access to a client, staff share accounts and passwords… once this goes on long enough, everyone has plausible deniability over any action like messaging users profanity even if messages are tied to someoneʼs account
"We don't know who moved a bunch of users from one concierge to another."
Because this app is an afterthought to the API, it has its own backend and its own account scheme. It then talks to the actual API with a single shared token. There's no way for the API to know and record who requested an action. No meaningful audit trail. Also no record of who got moved, and no way to undo the entire operation without spelunking through clients data individually.
for data heavy apps, API design has to be done right, because you will rarely get a chance to refactor it APIs by definition are meant to be stable, long-term contracts on how different apps interact To change them usually requires coordination across multiple teams, or at least people.
There are many resources that talk about what the API should look like from the outside. Iʼm here to talk about what it should look like on the inside.
if it returns a list that is at all likely to grow past 100 elements — PAGINATE
and no total count!
they will inevitably grow and require pagination themselves and thatʼs a whole new level of hell
it's either a list, or it has a complex deep response otherwise you're setting yourself up for some real performance pain
sorting and filtering belongs in the database, or in the backend sorting in the browser is 1000x more expensive
what do I mean by this?
You won't believe the creative ways people try to get out of having a dashboard where you can create and delete users, set up their permissions, reset their password... Having this when there's not much to it makes it easy to add other things as you need them
App users != client-facing personas 1 user : many personas 1 persona : many users?
You will inevitably need to separate how your staff logs in from what the end users see You will also probably need the ability to log in as someone else (for testing/monitoring/management/ quality assurance purposes)
Any destructive operation needs to be logged with the identity of the user who did it. You will eventually experience bad actors within your organization. It's important to be able to identify them.
Try to find it :)