The Untitled Book of Eliot Pearson

Software Development Is Not Just About Writing Code

Scaling From Day One

Posted on | June 20, 2009 | 14 Comments

There is a saying in software development, you write it first and then you optimize as needed. This is a very general rule and works much of the time. However, when you need your solutions to scale from day one, you can throw this rule out the window. Well, you aren’t really throwing the rule out, you are modifying it. The new rule is to write it optimally first when the solution needs to scale.

Scales

Going to the supermarket is a solution the needs to scale from day one. Let say we have two scenarios: you need to pick up two items and you need to pick up twenty items. I see two real approaches to the problem. The first would be to go to the store every time, and the second is to go only once. Going once is more optimal and it scales. Think of how much time and resources I burn up going to the store each time. You can argue that going to the store twenty times is not that bad, but what about 100 times. My point should be apparent now. Going to the store one time for every one item is inefficient

Let’s replace the supermarket drive with a database call. When you need to retrieve thousands of records, going to the database for each record can be costly. This is a common pitfall in development, the solution to retrieve information one item at a time is probably plaguing your world as we speak. You ever wonder why things take so long, the code in use may suffer from inefficient retrieval.

Comments

14 Responses to “Scaling From Day One”

  1. Sarah
    June 23rd, 2009 @ 7:55 pm

    Pretty cool post. I just found your blog and wanted to say
    that I’ve really enjoyed browsing your blog posts. In any case
    I’ll be subscribing to your feed and I hope you post again soon!

  2. Eliot
    June 23rd, 2009 @ 9:51 pm

    Thank you so much. I will keep at it.

  3. Joseph Ferris
    July 30th, 2009 @ 9:34 am

    You miss a fundamental point. The simple fact of the matter is that scalability is an artifact of an efficient and well-thought out architecture. The only way that you run into this issue that you are describing would be if you put no architectural forethought into what you are working on. Code should never be prematurely optimized. Poor code can be refactored, and poor code can even scale. Poor architectural choices, on the other hand, you are often stuck with and can only change at great expenses.

    Taking your example into account, it shows that you are missing a fundamental architectural decision. Yes, you only go to the store once, but you don’t put everything in the cart simultaneously, do you? What needs to be adjusted is the data access strategy, not necessarily the code. If the products are fairly static, there are distributed caching scenarios which makes picking items off the shelf one at a time trivial by managing the lifespan of the objects without the need for a database call, other than the one that initially seeds the cache. On top of that, if you actually want to “find” your products on the shelves, an indexing service could additionally lighten the database load.

    Not anticipating these scenarios is a sign of lack of planning and not because of the developer’s implementation of the requirements. If you bind what the developer is doing by an architectural requirement, then this code would never pass review – either a code review and/or a data execution review. Code can thwart even the best of architectures, but that generally tends to be a sign of a process issue. If you front-load development with optimization requirements, projects will take an excessively large amount of time to complete over what they should have.

  4. Eliot
    July 30th, 2009 @ 8:29 pm

    Very good insight. You probably work in an environment where architecture and development are logically (or physically) separate; where there is tons of cooperation between the two groups. But hey, I could be wrong.

    Let me ask you this, what happens when the proposed and accepted architecture is very rigid? Let’s say it’s similar to approach one. The more efficient data access strategy is shot down because of constraints. Some architectures have to be rigid. For certain companies and solutions this is needed.

    I can’t totally agree that scalability is an artifact of a well thought design. Scalability is a goal of a design. A well thought design can meet the needs of the customer and still not be scalable. If your solution has to scale from day one, yes, you should write optimal code first.

    I really appreciate the rebuttal. Please write back with more thoughts.

  5. Joseph Ferris
    July 30th, 2009 @ 10:25 pm

    Not a problem. I enjoy discussing these types of issues. My goal is not to change anyone’s mind, but rather share what I have learned – often the hard way.

    Let me give you a little background. I actually manage a team of developers. The team is rather small – especially after a couple of rounds of layoffs. Out entire IT department weighs in at about thirteen people right now. We definitely are not formal, although we have been trying to bring order to what has been a rather chaotic “process”. Being a small team, one of the hats I have been wearing for the last year or so is that of architect, as well. It is really interesting to have to sit on both sides of the fence and find compromise with myself, if that makes any sense.

    To your point about rigid architectures, I feel a rigid architecture has failed at least one of its goals. Architecture, application architecture – specifically, should help the team meet its goals by making tasks easier to do. This can be accomplished in a flexible manner, as well. There are many tenets to architecture, with scalability just being one. Some of the others would be maintainability, usability, extensibility, and reliability – to name a few. The architecture should attempt to balance all of these “features”. Rigid architectures usually too heavily on a few of these qualities at the expense of the others.

    We actually are closer to agreeing on scalability being an artifact than you think, however. As I mentioned, scalability is one of the goals – when executed properly, it does become an artifact of an efficient architecture. I am talking about the end-product, and you are talking about the expectations. Both are just two sides of the same coin.

    There are many architectural approaches that can meet the goal of the customer. Scalability needs to permeate all areas of the logical architecture – from data architecture, to application architecture, to physical architecture. The term “architecture” is very generic, and means different things to people in different positions. If you find yourself “having” to pre-optimize, the architecture has already failed you and the customer. You have to remember that you are a customer of the architecture as much as the end-user is a customer of the application.

    On data access, specifically, my team personally finds balance in DDD. We have a pre-built framework that requires a light amount of coding to accomplish basic CRUD operations. Since we operate more on a per-entity basis, it becomes quickly apparent that this part of the application architecture is still not enough to meet the goals of the developer. The solution is a multi-part one. We consult with the database administrators to figure out if there is an efficient means to get the required information from the database. If there is not, we build an “index” and store it in Lucene.NET. Since this is a pre-determined option, we have the code in place in the application architecture to make using it as transparent as possible, while still remaining loosely coupled. The example that I had previously used, with the indexing server and the distributed caching are real parts of our application architecture, because we (the collective whole of developers and database professionals) realized that there are multiple roles that data access would play. We would use it for relational mapping (building single entities), we would use it for search and bulk selection (building lookups for entities and a means to build them rapidly), and that entities actually do not change at a faster rate than they are utilized in most use cases (adding a caching mechanism). The end result is that there are a proportionately small number of scenarios that need “special attention”. But, who needs to give it special attention is not preordained. Maybe it is the developer who has to change the way something is coded, maybe it is the data team who modifies the data availability, or maybe it is a change in the architecture that provides a means to bridge the gap.

    We are actually talking about two forms of architecture here. The application architecture and the data architecture. A traditional layered architecture puts the database at the heart of the application, when it should be looked at more as a service of the application (and you will actually hear a lot of data architects refer to the database as a service). The application architecture should consume data. The rate at which the data travels from the database to the calling code is a mutual concern. If the data physically is not being delivered rapidly enough by the database, it needs to be addressed by the data team. If the bottleneck is at the point after the data is received while it is being processed by the code, then it becomes a code issue. A good example of a misplaced concern here would be if the query against the database was being generated in the code. This is the domain of the database team and the responsibility of coming up with a proper execution plan should be moved to the data team. Remember – the database is a service. You, as a developer, are a consumer of this service.

    The reason that we have implemented indexing and caching outside of the database, relating back to the example, is because of the fact that building objects after we receive the data is an expensive process. The CRUD itself is of minimal impact. But, since in a DDD model, entities are built one at a time, this is an acceptable trade-off. Could we process the data faster with some other approach? Definitely. But we actually lose a lot of other architectural goals that are important, such as loosely coupling the database to the application (remember, it is a service), loosely coupling the layers of the architecture in the same way that we are, and the fact that a lot of the actual database code is completely auto-generated. We spend more time focusing on the business logic and serving data to the front-end (where we are a service to the web developers, providing pre-determined data to plug into their interface).

    Search is expensive, in general. This is true whether you are selecting all of something in a large table, or just parts of it. The indexing service, for example, allows us to “chunk” out what is in the table without having to talk to the database until it is absolutely needed. For example, we are looking for “all products that are between twenty and fifty dollars”. Can we write this in code? Yep, but that in itself is not scalable. The indexing service actually uses the database to pre-build the index at pre-determined intervals, but the application can maintain the index in real time. A product changes price, the entity is modified and persisted back to the database, but the code also updates the index in real time. Since the indexing service is an application architecture artifact, this doesn’t violate the trust between the application and the database. Instead we are shifting reponsibility.

    Assume we want to get the index in chunks. We can do that. We can say, “give me the first twenty five products that match the criteria”. We get the product reference, look to see if each entity is cached, and if it is not, we build it, use it, and cache it. If the entity is modified, we evict it from the cache and it is JIT-loaded the next time it is needed.

    It sounds rigid and it sounds complicated, but it is neither. The archiecture provides hooks to create and consume an index, as well as allowing it to inject the caching policy. Caching policies are on a per-type of entity basis, and live in a simple configuration section in the config file. Caching strategies can be adjusted without having to do a build. Indexes implement an interface and are factory created, also by identifying the type of index interfaces in the configuration. Once again, it can be modified without touching the core.

    There is a cost to all of this, which is where I am imagining that you are having an issue in relating to this. Time and money. It is very hard to convince people outside of technology why these concerns need to be addressed and why it takes as long as it does. A proper architecture can’t be whipped up in a week. What we have has been the result of almost six months (albeit not at 100% resource dedication) of discussions, research, creating and throwing away proof of concepts, getting it in front of the developers to work with, and incorporating their feedback. The end product, though, is not specific to a single application. It is our application architecture that can be used for just about anything. A proper architecture should be scalable, maintainable, extensible, usable, etc. The devil is in the details when it comes to balancing these features, and that is where the time and money come into play.

    What is your current situation? Not trying to offend, but it really sounds like there might be some ownership issues, where parts of the database architecture is being claimed by the application architecture. A clear separation of concerns will help significantly with these types of issues. My opinion is that you should write code to reasonably and responsibly use the resources made available to you within the scope of your area of concern. This is not to slight the development team – quite the opposite. The architect (or designated victim) should address the concepts of the low-level mechanisms (such as a clear data access strategy to fit 95% of scenarios up front), the database team should focus on data architecture and availability (since they are a service), and this leaves the development team more time to focus on the problem domain. You should be empowered to do your job efficiently, in executing business logic and serving the consumers of your services.

  6. Eliot
    August 11th, 2009 @ 7:57 am

    This is a very lengthy reply and I really appreciate it. I have been hammered with projects lately. I will answers your questions asap. Before leaving, I will say you bring some very insightful ideas to the table. Thank you again.

  7. Duetuedetbape
    November 1st, 2009 @ 5:23 pm

    Thanks for post. Nice to see such good ideas.

  8. admin
    November 3rd, 2009 @ 6:17 am

    You’re welcome. I am just trying to get some ideas out there. Also, thanks for reading.

  9. Bruce
    May 21st, 2010 @ 8:29 pm

    Thank you so much. I will keep at it.

  10. Steve
    May 28th, 2010 @ 5:28 am

    Pretty cool post. I just found your blog and wanted to say
    that I’ve really enjoyed browsing your blog posts. In any case
    I’ll be subscribing to your feed and I hope you post again soon!

  11. admin
    July 3rd, 2010 @ 5:36 pm

    Thank you. I am glad you enjoyed it. Looking forward to seeing you return..

  12. Computer Locking Up Repair
    August 11th, 2010 @ 5:45 am

    I just couldnt leave your web site prior to saying that I truly loved the high quality info you offer to all your visitors. Will be back frequently to check up on new information in you article!

  13. Eliot
    August 15th, 2010 @ 11:36 pm

    You are very kind. Please come back soon.

  14. legit online jobs
    August 25th, 2010 @ 3:33 am

    This is a good blog. Keep up all the work. I too love blogging and expressing my opinions. Thanks :)

Leave a Reply