Data Science: the field where everyone assumes you can use "ML" (Magic Language) to make everything work perfectly if they just give you three weeks of coding and 12 pictures of apples (an actual experience I had). Fortunately, Microsoft seems to be better about that, and I think you'll have a lot more help, resources, and understanding than you might have had elsewhere. There's a lot in this field though, and what you wind up doing will depend on you, your team, your org, and your position.

The Types of Data Scientists

This isn't a Buzzfeed quiz, but instead we'll be actually detailing a lot of the specific subtypes of Data Science roles there are at Microsoft. A lot of the internal naming is still being worked out, but empirical analysis leads me to the following three categories. Please note that you may take on more than one of these roles, but you will largely be able to align yourself with one of the following:

Research Scientist

These are the model people, the typical person you think of when someone says "Data Science". Mostly PhD students or those with prior experience, this role involves making a lot of strong, powerful models. These may involve many hours or days of training possibly on enormous datasets, or they could be very small models or using small datasets. This role is largely about understanding what makes different models tick, and how to construct models to best tackle a given problem. This role requires a deep understanding of different model types (what makes them good or bad/which is best in a given situation), construction techniques (how should you design and make a given model), and data manipulation ability (how should you make your data best work with the model). People with this role should be able to take specifications for a model as well as data and deliver a good result, assuming a good result is possible. This role is also responsible for research, developing new model types, testing different optimization techniques, etc.

Data Analyst/Data Engineer

This role takes on a more preliminary role, and is largely preparation for the research scientist for a given project. That does not make this role any less important though! For context, most of the Covid-19 work we saw being done, including graphs, charts, predictions, clustering, etc. is all considered to be part of data analytics, and thus part of this role. In addition to actually scraping the web to grab this data, this role also involves doing some preliminary analysis on the data collected, trimming/expanding the data as needed, and determining if the project has enough data to even be viable. Projects are frequently stopped in this stage, as it may turn out the data available is simply too imbalanced to accurately train without significant bias, and available data expansion techniques would not work for one reason or another. Doing well in this role requires a strong knowledge of data visualization, collection, cleaning, expansion, and balancing techniques. In addition, you should have a good understanding of data analysis (duh) and knowledge of what makes a dataset "good" or workable.

Applied Scientist/ML Engineer

This role is one of the newer roles, and "ML Engineer" frequently is used to refer to many different roles, but fits best here. An applied scientist is the "SWE of DS", meaning that given a project, the applied scientist should not only know how to build the model, but also build the project around the model. Of all the roles listed here, this one has the most breadth and the least depth, as you will need to know a multitude of cloud services, architecture techniques, and software development paradigms in addition to having a strong high-level understanding of modelling techniques. This role is heavily involved throughout the process, even from the planning stage. One of the most valuable contributions to a project that an applied scientist can have is to find a way to not use ML, and instead find a better solution. If ML is decided to be used, the applied scientist is the one to come up with the architecture on how to deploy the model, as well as gather information on what requirements a model may need (size, memory usage, inference time, etc.) and ensure the research scientist making the model meets those requirements. While that development is happening, the applied scientist will usually make a T0 model as a placeholder to continue developing the application. This role requires a strong understanding of ML techniques, as well as SWE and systems architecture skills.

Day-to-day

The day-to-day for DS is

Differences Between Levels

Compensation