Abstract—A part-of-speech tagger as signs the correct
grammatical category to each word in a given text based on the
context surrounding the word. This paper presents Mi-POS, a
Malay language Part-of-Speech tagger that is developed using a
probabilistic approach with information about the context. The
results of benchmarking Mi-POS against several similar
systems are also presented in this paper and the lessons learnt
from it are highlighted. The dataset used for evaluation consists
of manually annotated texts. The authors used the accuracy and
time to measure the results of this evaluation. The final results
show that Mi-POS outperforms other Malay Part-of-Speech
taggers in terms of accuracy with an accuracy of 95.16%
obtained by tagging new words from the same training corpus
type and 81.12% for words from different corpora types.
Index Terms—Benchmarking, Malay language, natural
language processing, part-of-speech tagging.
Dickson Lukose, Khalil Bouzekri and Benjamin Chu Min Xian are with
the Artificial Intelligence Lab at MIMOS Berhad, Kuala Lumpur, 57000
Malaysia (e-mail: dickson.lukose@mimos.my, khalil.ben@mimos.my,
mx.chu@mimos.my).
Mohamed Lubani and Liew Kwei Ping are with the University of Malaya,
Faculty of Computer Science and Information Technology, Kuala Lumpur,
50603 Malaysia (e-mail: mohamed.lubani@siswa.um.edu.my,
liewkweiping@siswa.um.edu.my).
Rohana Mahmud is with the Department of Artificial Intelligence,
Faculty of Computer Science and Information Technology, University of
Malaya, Kuala Lumpur, 50603 Malaysia (e-mail:
rohanamahmud@um.edu.my).
[PDF]
Cite: Benjamin Chu Min Xian, Mohamed Lubani, Liew Kwei Ping, Khalil Bouzekri, Rohana Mahmud, and Dickson Lukose, "Benchmarking Mi-POS: Malay Part-of-Speech Tagger," International Journal of Knowledge Engineering vol. 2, no. 3, pp. 115-121, 2016.