Hi! Today, I am bringing some good news. The Google Summer of Code 2020 results were announced and I was accepted as a student!
I am excited and grateful for this opportunity that KDE community has given to me and I will focus to do an excellent work during this project. 🙂
I will be working on marK, a machine learning dataset annotation tool, which I have already contributed during Season of KDE 2020. If you don’t know about it, please check my status report.
And here is a brief description about what I am going to do during this program and an explanation about some of my plans to accomplish all the objectives:
Improving marK codebase
I will improve the codebase of mark to make it extensible, making easier to add new types of annotation, e.g. text and audio annotation. To accomplish that, I will separate the image annotation logic from the current codebase, and improve wherever possible. The new core of marK will take care of different tasks related to annotation of multiple types of data.
Implementing text annotation support
Sketchy idea of how text annotation may be in marK
First, I will explain a bit about text annotation, which is the task of labeling text-based data. It involves the process of highlighting and tagging the desired terms in a document or text and its result can be used to train machine learning models for different purposes, e.g. entity linking and text classification.
For now, marK only supports image annotation. After finishing the aforementioned objective, I will add support to text annotation, using some Qt and KF5 structures for text manipulation, such as KTextEditor. These APIs are going to be helpful as I will integrate them with new components that will handle tasks related to text annotation such as labelling.
To provide a visualization of how the annotated output will be, here is how I am planning to serialize its JSON, which will be similar to the format that is current being used for image annotation:
It is worth mentioning that I will take advantage of the first phase of GSoC to study more about how text annotation works and improve my knowledge about Qt, software engineering (more specifically how to write good, maintainable code) and, of course, bond with the community.
My GSoC experiences and progress will be published in this blog, also my proposal can be found here.
That is it, see you in the next post 😉