Getting Data Science Work Done Remotely

Given the current Covid-19 crisis that has led to massive disruptions to how we work, communicate and collaborate, there is an understandable interest in being able to do data science work remotely and effectively. In a sense, this capability has been brewing in the background, because the data science talent crunch experienced for several years before data science skill sets went mainstream, was an opportunity for companies to hire talent around the world, and work remotely.

Despite this, there have been challenges in building excellent remotely managed teams in all technology sectors, including data science and AI- and ML-centric teams. For one thing, remote work is fraught with asynchronous work, meetings and the need for over-communication. Another aspect of remote work is the need for interpersonal interactions and relationship-building between peers and team members. These are undeniable aspects of what makes remote work in itself challenging – sometimes, phone calls are not enough, and video calls done professionally and effectively require discipline and commitment on behalf of the participants. All asynchronous collaboration requires goal-orientation and timeliness in the execution of tasks. These are, of course expectations that you may have from ideal employees and team members. Based on the working style and approach of different individuals, you may have very different reactions from your team members to the same ground rules.

Making Remote Data Science Teams Effective

I can think of some ways in which we can enable better interactions and more productive data science teams across locations, time zones and in complex data science projects:

  1. Knowing your team members well. This piece of advice is not about data science, it is about just being a good team player or leader. There is no substitute for actually building great interpersonal relationships. Humans are not robots – professionals are people and are driven by meaning, reason and have motivations of many kinds, both personal and professional. All of us have challenges when listening and taking feedback that requires us to change, whether we’re leading or contributing. Some of us may have quirks and oddities that make us interesting in some ways and annoying in other ways. There are some good ways to build such relationships in remote ways.
    1. Don’t skip the small talk. Ask about how they are doing when you start a video or audio call. A little small talk never goes to waste, especially at the beginning of a meeting. In these times where people may have vastly different levels of well being, whether because of being affected by Covid-19, or otherwise being impacted because of it, it never hurts to ask.
    2. Empathize and make your team feel wanted and welcome. Ensure you empathize with them in case they come out and say that they’re having a tough time. It doesn’t help to be the “strong and silent” kind of individual when the person at the other end is communicating difficulties that they’re having. Video and audio calls require us to overcommunicate.
    3. Understand your team members’ habits and quirks. Share a joke or two, and understand how your team responds. Determine what makes them tick, and what kind of work assignments interest them.
  2. Setting ground rules and “team level agreements”. One of the biggest enablers of productivity in data science teams is knowing what tasks are meant for who. In real world data science teams, things may not be clear-cut, when it comes to the broad span of tasks that data science team members need to do. Ground rules ease the situation and collaborative tools enable this to happen well. Tools like Teams or Slack are great at building contextual conversations. However, you can lose sight of the bigger picture here, because you’re following along multiple threads under a specific topic. What helps here is setting up Wikis, and doing effective stand up calls.
    1. Wikis and how they can help: Team wikis help by consolidating the key information in one place. They’re great for teams who have been working a certain way in an office setting and have had to transition to remote teams, sometimes with new team members. They can provide a nice, section-wise summary of key tasks and elements of the work stream or the job in question. In the context of data science teams, Wikis can help in the following ways: a) by being project documentation, b) by instructing on specific tasks – be this environment set up for a Python task, or PEP 8 guidelines, class hierarchy for an application, a list of hypotheses to explore in statistical analysis, etc., or c) by being a repository of tribal knowledge about the solution you’re building. Wikis can be created by using documentation tools like Read The Docs, or even by Wiki servers. Where Wikis are too much work to do, simple documents (Google / Word / Confluence) can help. On Confluence, you can attract comments as well, and this can make things easier in some ways.
    2. Stand up calls and doing these effectively. Stand up meetings (typically these are 15 or 20 minutes long) are a great way to start the day and gain some momentum with respect to the features and solutions you’re building. They shouldn’t be long drawn out, but should just focus on the key accomplishments and blockers. Add in a little sugar – use video and do a nice, positive team ritual. Ensure that you do round-robin updates, because this makes sure everyone is involved. When doing data science, you may get updates on how a certain analysis went, or whether new findings were made with respect to some data, or whether a model training step failed or succeeded in some way. All of these are useful points of exploration and problem solving for the day ahead. These initial discussions in the day can be the source of a new synergy – if you are a leader, you generally respond to some of these, or perhaps you bring ideas from the previous day to share and guide the team in a specific way. If you’re a contributor, you’re likely to make key notes of things that happened during your work day and share them in the next day’s stand up call.
  3. Few meetings, but effective meetings. Having spent time in big corporate and in startups, I see a tendency on the part of managers and employees who come from large organizations to gravitate towards meetings to solve problems. The reality is that meetings are rarely effective in and of themselves, and better consensus can be built asynchronously in written communication that can be read, absorbed, digested and then responded to. Something to keep in mind in remote teams:
    1. Data scientists and engineers need to write well. They should be able to articulate thoughts, ideas and complex constructs well, and should be able to respond with the appropriate amount of detail. Without this ability to think critically and express themselves in written form, the organization becomes meeting-driven, and can descend into chaos when these meetings sap team energy
    2. Meetings should have clear outcomes. These outcomes should be decided at least five minutes before the end of the meeting. The agendas for meetings need to be clear before the start of the meeting, which isn’t often enough the case. Finally, the results of the meeting should be circulated over Teams/Slack/Email with clear expectations on the part of those involved.
  4. Remote-sourcing domain expertise. The availability of team members from across the world is a potential benefit from having fully remote teams that companies haven’t fully realized the benefits of yet. If you’re doing data science in the energy sector, for example, you may be interested in building a team with occasional input from an energy sector domain expert you may not have had access to before, because more people have opened up to consulting remotely. Such domain expertise is especially important in building effective teams across borders that understand the domain well enough to be effective as a data science delivery team in a specific industry. Technology, statistical analysis and communication skills together cannot solve a problem that also requires domain knowledge to solve, and this can be done effectively by sourcing such talent or expertise remotely.
  5. Working effectively without borders requires planning and effective documentation. Working across time zones can be hard at times – when we work in synchrony, communication can be easier. However, asynchronous communication and work will become increasingly important in the age of Covid-19 and beyond, as more and more teams become remote and distributed. Working remotely doesn’t equate to being flexible time-wise, but being effective with your tasks. For example, you may be on a call at 9pm with your team that’s in a different time zone, and are retiring for the day soon after – this situation calls for planning your upcoming day and tasks well enough for you to be effective at solving those problems in front of you. This may be a challenge for those with small children and families, but there are probably ways to work around it all. The solution is not stretching long into the night to complete that task. This probably impedes personal health, productivity and other aspects of well-being, both personal and professional. Effective asynchronous work requires good planning and documentation.
  6. Leading across borders requires planning, patience and understanding. When you have team members in other time zones and are delegating tasks to them, you probably have to curb the enthusiasm to do things yourself, or to delegate to someone closer to home – this risks isolating those who work in remote time zones. If you find yourself doing this, consider that you may need to rethink how you are structuring your project and tasks.
  7. Pair programming and digital mentoring can be really effective. When your team members are trying to develop new skills and don’t have anyone to turn to for help, they can benefit greatly from pair programming sessions and digital mentoring. These are not new practices, and there are platforms available to do this across teams – but what’s important is to have regular communication to sort issues out as they happen, and help people correct course as soon as they need to do so. Pair programming enables specific and contextual feedback. Whether statistical analysis or application development, data science mentoring done remotely can be a big enabler of growth and technical accomplishment.
  8. Managing work environments well. Work environments here doesn’t only refer to the physical environments, such as the work from home setup we each may use, but also the digital environment, which enables us to find the required information at the right time, or which enables us to construct new workflows as we need them. This extends to code and the virtual environments we code in. When we’re building continuous delivery pipelines that provide the required environment for running and testing code, we are enabling such a need. The tools are half the reason that highly effective teams are as effective as they are.
  9. “Gitting Good” – managing asynchronous data science work effectively. Managing an asynchronously updated code base requires your team to adopt and work well, with the right ground rules, on specific development branches of your code repository. A lot of product development firms have nailed the process of managing code on git or other version control systems well, and indeed this applies to many data science teams as well, but this is as much a matter of individual and collective discipline as it is about systems and processes and Wiki pages. Sometimes, we need hard conversations to happen to bring teams on track – and in the context of code discipline, I have seen a fair few situations of this nature.
  10. Taking time to document (and to “RTFD”). Documentation is one of the more tedious tasks for most developers, and it can be a chore for data scientists as well. Good data science teams, however, build their solutions on top of excellent documentation, where key questions are answered and many elements of their problem solving approaches are documented well, where required with links to papers, journals and results as appropriate. In cases where novel algorithms are written these should not only be put together in modular and reusable ways, but also documented well, so that the solution is intelligible to the broader team and to clients. As important as writing the documentation, is reading it. When new team members come in, or people change assignments on a project, it is important to keep the documentation relevant and ongoing.

Concluding Remarks

Naturally, all of these behaviors don’t happen all at once and aren’t developed in a day – excellence takes time, persistence and diligence. There are many teams out there that are doing lots of incredible work, and fully remotely, in the data science space. I hope that some of the above ideas make sense to your team and that if nothing else, this post made you think about how your team’s currently performing and how to amp up your team’s performance!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s