Developer Diary 2: Contribute to Open-Source Projects and Dirty Work
Introduction
This article covers the following topics: How to contribute to open-source projects, and reflections on dirty work.
How to Contribute to Open-Source Projects
Recently, I submitted several PRs to a few open-source projects and gradually became familiar with the process of contributing.
Assume the open-source project is alice/repo1, and its development branch is master. If we (with the username bob) want to modify the code, we first need to fork the project. Let's say we fork it to bob/repo1:master (usually, forking the development branch is sufficient).
The most straightforward approach is to make changes directly on our master branch and submit a PR to the original repository. The downside of this method is that a PR often includes multiple commits, and the maintainer typically handles PRs by squash merging. This causes our branch to diverge from the original repository's branch.
The solution to this conflict is as follows: First, when syncing the fork on our remote repository (via the GitHub webpage), click "discard 4 commits" so that our remote repository's branch aligns with the original repository. At this point, our local repository and the remote repository are out of sync again, so we need to use the command git reset --hard origin/master to force synchronization with the remote repository. This command discards the old local commits and forcibly syncs with the remote repository.
Obviously, handling this every time is quite cumbersome. Therefore, the following approach is recommended: Create a new branch from our master branch, such as fix-typo. Make commits in this branch, push them to the remote repository, and submit a PR to the original repository's development branch (from bob/repo1:fix-typo to alice/repo1:master). Regardless of whether the maintainer merges the PR or not, our development branch will never diverge from the original repository's development branch, thus avoiding conflicts. Once the PR is merged, the fix-typo branch can be deleted.
For a PR, if the maintainer requests changes or if we realize further modifications are needed, what should we do? The answer is simple: Just update the code in the original branch and push the changes. After pushing, GitHub will automatically append the new commits to the existing PR. Then, we can see the new commits on the PR page.
Additionally, when submitting a PR, especially for feature requests (FR) or other changes involving significant code modifications, it's important to provide a detailed description of our design approach, thought process, challenges encountered, and solutions. This increases the likelihood of the PR being accepted because, ultimately, we are collaborating with the original author. If the PR only includes minor fixes like typo corrections, a clear title is sufficient.
Reflections on Dirty Work
I first encountered the term "dirty work" in the context of data analysis roles at some companies, where the actual work involved tasks like data cleaning, which were considered technically unfulfilling and not beneficial for personal growth. Of course, this argument is likely valid, but discussing it is beyond the scope of this section.
The problem is that this led me to develop a flawed mindset, where I began to label anything lacking sufficient technical challenge as "dirty work." The danger of this mindset is that as one's technical skills advance, more and more tasks fall into the category of dirty work, potentially leading to arrogance, a lack of practical skills, and diminished enthusiasm for technology.
Recently, I submitted several PRs to implement feature requests (FR issues) that were proposed by others two or three years ago. These features weren't difficult to implement, but the maintainer hadn't implemented them at the time. Instead, he added labels like "good first issue" or "volunteer welcome" to these issues. In other words, the maintainer welcomed others to "earn easy contributions" by implementing these features that he himself didn't need. Note that the maintainer had no use for these features but still welcomed contributions from others. He didn't use phrases like "not interested" or "dirty work." Admittedly, this is a matter of politeness, but for me, it also reflects an inclusive mindset, which prompted me to reflect on my own attitude.
The essence of programming is to improve efficiency through automation. Only by writing code can developers reduce the amount of dirty work for users. From this perspective, it's worthwhile even if our code involves some dirty work. Additionally, I recently came across news about a impoverished village that introduced a data annotation industry, providing jobs for the villagers and helping them escape poverty. While data annotation might be seen as dirty work from a developer's perspective, it isn't necessarily the same for others.
In conclusion, dirty work is a subjective and conditional perspective. The work itself isn't inherently flawed. However, when we are assigned dirty work in our jobs, can we still view these tasks impartially?