Thu Jun 30 23:53:24 HKT 2011
From
/weblog/design/distribute
There are two key primary ways of scaling web applications which is in practice today.
1) “Vertical Scalability” - Adding resource within the same logical unit to increase capacity. An example of this would be to add CPUs to an existing server, or expanding storage by adding hard drive on an existing RAID/SAN storage.
2) “Horizontal Scalability” - Adding multiple logical units of resources and making them work as a single unit. Most clustering solutions, distributed file systems, load-balancers help you with horizontal scalability.
Scalability can be further sub-classified based on the “scalability factor”.
1) If the scalability factor stays constant as you scale. This is called “linear scalability“.
2) But chances are that some components may not scale as well as others. A scalability factor below 1.0 is called “sub-linear scalability“.
3) Though rare, its possible to get better performance (scalability factor) just by adding more components (i/o across multiple disk spindles in a RAID gets better with more spindles). This is called “supra-linear scalability“.
4) If the application is not designed for scalability, its possible that things can actually get worse as it scales. This is called “negative scalability“.
http://www.royans.net/arch/2007/09/22/what-is-scalability/ Report of building web application with 55k pageload with rail -
http://shanti.railsblog.com[..]mongrels-handled-a-550k-pageview-digging XMPP a IM protocol about scalability -
http://www.process-one.net[..]icle/the_aol_xmpp_scalability_challenge/ Presentation and resources of making you website more scalable -
http://www.scribd.com[..]9/Real-World-Web-Performance-Scalability http://www.theserverside.com[..]lications&asrc=EM_NLN_3990118&uid=703565 http://www.theserverside.com[..]ionsPart2&asrc=EM_NLN_3990119&uid=703565 Brian Zimmer, architect at travel startup Yapta, highlights some worst practices jeopardizing the growth and scalability of a system:
* The Golden Hammer. Forcing a particular technology to work in ways it was not intended is sometimes counter-productive. Using a database to store key-value pairs is one example. Another example is using threads to program for concurrency.
* Resource Abuse. Manage the availability of shared resources because when they fail, by definition, their failure is experienced pervasively rather than in isolation. For example, connection management to the database through a thread pool.
* Big Ball of Mud. Failure to manage dependencies inhibits agility and scalability.
* Everything or Something. In both code and application dependency management, the worst practice is not understanding the relationships and formulating a model to facilitate their management. Failure to enforce diligent control is a contributing scalability inhibiter.
* Forgetting to check the time. To properly scale a system it is imperative to manage the time alloted for requests to be handled.
* Hero Pattern. One popular solution to the operation issue is a Hero who can and often will manage the bulk of the operational needs. For a large system of many components this approach does not scale, yet it is one of the most frequently-deployed solutions.
* Not automating. A system too dependent on human intervention, frequently the result of having a Hero, is dangerously exposed to issues of reproducibility and hit-by-a-bus syndrome.
* Monitoring. Monitoring, like testing, is often one of the first items sacrificed when time is tight.
http://highscalability.com/scalability-worst-practices Useful Corporate Blogs that Talk About Scalability -
http://highscalability.com[..]l-corporate-blogs-talk-about-scalability Overview of mapreduce and how it compare with other distributed programming model
-http://natishalom.typepad.com[..]0/is-mapreduce-going-to-main-stream.html Paper of data store at amazon
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html Discuss how haven't sync can cause performance issue -
http://www.theserverside.com[..]lications&asrc=EM_NLN_6273194&uid=703565 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6423457 Discussion about Cloud Based Memory Architectures -
http://highscalability.com[..]ased-memory-architectures-next-big-thing http://highscalability.com[..]alability-and-performance-best-practices Interview with google engineer -
http://www.zdnet.co.uk[..]gle-at-scale-everything-breaks-40093061/
(google search)
(amazon search)
Mon May 09 00:28:50 HKT 2011
From
/weblog/design/distribute
1. Use Cloud for Scaling
2. Use Cloud for Multi-tenancy
3. Use Cloud for Batch processing
4. Use Cloud for Storage
5. Use Cloud for Communication
http://horicky.blogspot.com/2009/11/cloud-computing-patterns.html http://horicky.blogspot.com/2009/11/nosql-patterns.html Database in cloud -
http://drdobbs.com[..]int?articleId=218900502&siteSectionName= An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics -
http://www.biomedcentral.com/1471-2105/11/S12/S1 The architecture that survived when amazon outage -
http://www.infoq.com/news/2011/04/twilio-cloud-architecture
(google search)
(amazon search)
Sat Sep 12 15:54:31 HKT 2009
From
/weblog/design/distribute
In one sentence, here's why: humans are notoriously bad at keeping "self" distinct from "other". Egomania, projection (transference), and enmeshment are well-known symptoms of this problem. OK, so I hear you saying, "yeah, but what does this have to do with programming?" It certainly seems absurd to suggest that if we are bad at something we know the most about (our "selves"), how could we possibly say that we have a good approach for the programming analogues - objects, modules, etc. -
http://www.artima.com/weblogs/viewpost.jsp?thread=46706 Argue why space base design is better than n-tier design -
http://www.google.com[..]0The%20End%20of%20Tier-based%20Computing Some key research of google for distributed computation -
http://www.infoq.com/news/2007/06/google-scalability Someone think we are not yet (per Oct 2007) have good language support for distibuted computing -
http://kasparov.skife.org/blog/2007/10/11/ A blog contain a lot distributed computing information -
http://www.highscalability.com/ How Wikipedia manage their site -
http://dammit.lt/uc/workbook2007.pdf Google tutorial for Design Distributed System -
http://code.google.com/edu/parallel/dsd-tutorial.html http://en.wikipedia.org/wiki/Distributed_hash_table The Hadoop Distributed File System: Architecture and Design -
http://hadoop.apache.org/core/docs/r0.18.0/hdfs_design.html http://www.metabrew.com[..]-a-list-of-distributed-key-value-stores/
(google search)
(amazon search)