Storm and Kafka - parallelism is not magic
This came to me when we were trying to exploit the maximum out of parallelism factor in our storm topology. While going through the docs and understanding storm, I had in mind that we should increase the parallelism factor to get more throughput out of storm.
We had a sample storm topology which was using a kafka spout for its input feed. But after trying to increase the parallelism factor more than 1, we dint get a much of gain in throughput from out storm execution.
This led me here:
https://groups.google.com/forum/#!topic/storm-user/mBA1e6Y1MYY
which quotes Nathan Marz saying
"The maximum parallelism you can have on a KafkaSpout is the number of partitions."
And all the spout instances which are more than the number of kafka partitions for the topic we are subscribing wont read any data.
So if you are trying to get maximum out of the parallelism factor of storm be sure to have that many number of partitions in your kafka topic. :)