Week 5

How I Decided On An Open Source Community

Over the last few weeks, I have been blogging about my experiences diving into open source. So far I haven’t contributed any code, documentation, or pretty much anything. Instead, I focused on understanding what open source is and why it is not at all terrifying to contribute. For example, I have written about licensing, the concepts of free versus open source software, my experience at meetups, and how to evaluate open source projects. Although I didn’t contribute to open source software during that time (aside from adding data to Open Street Map), I now have learned enough to choose confidently an open source community to join. This blog post is about how I came to that decision, and why I chose that specific community.

When evaluating different communities I was looking into which ones I felt like I could contribute long term. The reason why I chose to evaluate them this way (rather than by language, mission statement, or the countless other ways), was that it allowed me to see if they would be a place where I could learn a lot for many years. That way, all the efforts I put in to become a part of the community will benefit me for years to come, and the benefits will compound as I get deeper and deeper into the project. I see open source as a way to develop real skills and impact real communities. I also see that the deeper you get in, the more these skills develop. So rather than bounce around from project to project, I would rather be deeply part of one.

When evaluating projects this way I realized almost all of the projects I felt I could learn the most from were developed by Apache. I am interested in distributed systems, cloud technologies, scalability, and everything data. Apache has technologies like Hadoop, Spark, CloudStack, MapReduce, and many similar projects. Because I have wanted to learn these kinds of architectures intimately, I decided that Apache would be the perfect organization to develop open source for. As I developed for one Apache product, I will become more confident developing learning the Apache protocols, technologies, and the architectures I am interested in in the process.

I noticed also that almost all the projects I was interested in were programmed in Java. Sometimes Scala and Python as well. But Java primarily. I don’t know Scala or functional programming, so I avoided the projects that used Scala. Although I eventually want to learn Scala, it seems counter-productive to learn a new programming paradigm right now. So I narrowed down my search to projects that did not contain Scala.

That left me with lots of projects. Apache Nifi, Apache Hadoop, Apache CloudStack, and Apache Cassandra were the ones I was most interested in. I looked at these projects and a few others to see which one seemed best to me. I immediately eliminated Hadoop because although I know it is a very important technology for large scale data problems, I don’t yet have any experience using it. To use it, I would also need to learn its associated technologies. The learning curve seemed way to steep so it did not make sense to pursue it.

Apache Nifi is a data pipeline, which is interesting but not as interesting as some of the others to me. So I eliminated that one as well. That left Apache Cloudstack and Apache Cassandra. Both to me are extremely interesting. I am very interested in Cloud technologies so I would love to learn how IAAS platforms are implemented. I am also very interested in data storage and distributed systems, and Apache Cassandra is a wide column store database that uses a distributed paradigm to store and query very large datasets.

Both of these were equally interesting to me. I decided to evaluate the communities. Both had extensive documentation of how to contribute, nice communities surrounding the software, and seemed to respond well to new contributors.

From there, I started looking at the issues of each software. As I was analyzing the issues from Apache Cloudstack I realized something somewhat important: I didn’t have the slightest idea of what they were about! So unfortunately Apache Cloudstack could not be the project to which I would contribute. Not until a few AWS certifications at the very least.

Then I looked at Apache Cassandra. I have programmed using relational databases, distributed microarchitecture systems, and NoSQL databases, so when I was reading the issues they seemed challenging but doable. However what I was most excited to realize was that they were missing lots of documentation, which is a place I could contribute while learning the system! That would allow me to be part of the community and contribute while I am learning the system’s ins and outs. This would also be very fulfilling, to think of all the users reading the documentation I contributed to on a daily basis. From there, I was convinced, Apache Cassandra it is!

Written before or on February 28, 2019