Project phases

To put the need for training, validation, and evaluation data into context, let’s have a look at the structure of a typical NLP project. There are many variants of the way organizations work with applied research, innovation, and analytics. Most of them progress roughly by the following steps:

  1. Specify the problem to solve.
  2. Assess the data available to produce and evaluate solutions to the problem.
  3. Select technology to use for solving the problem.
  4. Implement a solution on the form of a demonstrator, prototype or product.
  5. Mutual transfer of knowledge between the research team and the client.

The above steps are usually carried out in an iterative fashion in that, e.g., the problem specification is re-visited and updated as the knowledge of the data available is increased, the transfer of knowledge is omnipresent in all stages of the project, and the implementation is iteratively updated and tested as the project progresses.

Although the Data Readiness Levels described in the previous section permeates all stages of the project management, steps 1 through 3 in the project structure above are where the data readiness is usually addressed in depth. Thus, it is crucial that the Data Readiness Levels are in order at the beginning of the project.

Common data science, and analytics project processes include CRISP-DM, and TDSP.