Buzzkill
There comes a time when the passion you show for something is not necessarily reciprocated. The world of open source, much like the real world, has real people in it; people interact with each other, comment on each other’s work, discuss new features, fight dragons, and shoot zombies. Okay, maybe not the last two actions. But, people certainly interact with each other to the same degree as they would in real life. You can imagine, then, that something like the fight from down the street could take place in the open source world, except in a more dramatic fashion with, perhaps, a greater fallout. Similarly, that nice lady at the pub could care less about your fishing rod and walk away after you have clearly shown interest; this, too, can be reproduced in the world of open source
A Fox Story
I have mentioned previously how important mozilla is for the internet. If not, you can hear me say it now
Mozilla is important for the internet
Keeping in mind the busywork that goes in for the development of tools like Firefox, Firefox Aurora, Firefox Focus, Fenix, Firefox Send, SpiderMonkey, and loads of other programs that do not have firefox in their name, it is understandable that projects like Common Voice do not get as much attention. I am not accusing the nicest people over at Mozilla of anything – even though they did decline my internship application, no hard feelings. Instead, I can see which projects they are prioritizing. When I last logged on to the voice-web repository, I was alarmed to see that some of the pull requests were up there for two days; this gave me an ominous vibe and I shuddered. Of course, I do not believe in superstitions; therefore, I shrugged it off and tried to work through some issues.
Neglect
Issues, it would seem, are not just bugs waiting to be fixed
I love Common Voice and have been excited to work on it since I switched projects. There are only a handful of weeks left before my final contributions are due. And, although I am taking it slow, I am slowly understanding it. Code contributions, I have found, are not difficult to make once you are familiar with the overall method of the community, thence the direction one needs to take becomes clear. However, much like the pub with the uninterested acquaintance, there will be times in the open-source world where you will be left hanging. This week, if anything, has taught me not to take it to heart.
Although I found a bug I will most likely work on, I went on exploring the voice web repository. During my excursions, however, I found that the help wanted issue label was particularly peculiar.
The Host
After I setup the web server on my local machine with a great effort and docker-compose up
, I went on to see some of these open issues on GitHub and filtered by the help wanted label. I saw issues that have been there for at least six months; in fact, some were there for more than a year. Besides, these issues were disproportionate compared to the number of all open issues on the repo (229 as of writing). This made me slightly uncomfortable since it meant that either newer issues were too difficult for beginners or the issues were not triaged properly for newcomers. Both meant trouble.
I brushed it off, however, and took on an issue that has been discussed for over a year. The issue was to determine a cheaper alternative to Amazon S3 for hosting the massive dataset (22G of audio files). I was surprised to see the issue still open. “Money must be burning a hole through their pocket,” I thought. I followed the discussion and realized that it was stuck on a library which cannot efficiently split the files into pieces for hosting. Although one of the contributors made a mirror for the dataset, it was extremely outdated and un-automated. There were suggestions of using archive.org as a potential distributor to reduce the load. So, I went to archive.org and researched their automated tool for uploading and hosting files. Interestingly, Internet Archive allowed for 100G of file hosting as long as they were not over 10,000 files. I proceeded to check how many audio files the dataset contained and found out that there were 677,021 mp3 files and 3 tsv files. I reported my findings and one of the contributors suggested using an archive to reduce it to one file.
Because I am Reluctant
It is worth mentioning, however, that when I first brought up the hosting issue on the Slack channel, I did not receive any acknowledgement. Nobody replied to me, nobody reacted to my mention, nobody pointed me to a different direction, nobody said anything about the issue’s relevance, nobody batted an eye. I was immensely discouraged, especially after seeing that someone who came to the slack to “say Hi” was greeted with many “welcome”s immediately after my inquiry.
I slowly brushed it off as a singular incident. But, it was not.
I went to another issue that dealt with developer onboarding and continuous development. The issue described the importance of establishing a front-end only development environment since, according to the author, most of the issues were regarding the front-end. To my dismay, after I asked regarding the implementation of the build scripts necessary for the development environment, I was accosted with… silence. Deafening. Silence.
My reluctance increased manifold and soon I forgot about the repo entirely.
Show Must Go On
For an engineer, I have been told, it important to pick your battles. Whether or not a problem is even worth solving is a question beginners (i.e., myself) often forget to ask. Keeping this in mind, I shifted my gaze temporarily to another side of open source.
This week, I delved deeper into Google’s Summer of Code and wrote up my first draft. I got strong correspondence from my mentor when I discussed possible approaches to the proposal and got early feedback. Thus, I am in a content mood and still continuing to participate in open-source in general.