Imagine that you are the owner of Gary’s Shoes and that you want to get data from all of your multitudes of stores into a centralized location. You’ll use that data to make decisions, predict future trends, etc. Given that each store must operate independently, you have a server in each location that will push up its changes (and get updates from) the HQ cluster. You can see an example of this kind of setup in this post.

This works quite well, but it does require the user to be aware of a potential issue. When you have a massively distributed data flow process setup, you need to also pay attention to the quiet in the noise. What do I mean by that?

Source: DZone