3rd Floor, Plaza 2000, Mombasa Rd. +254 716 490 808

Use landscape mode to view all table data from your mobile phone.

This project entails the successful extraction of information from the Kenya Industrial Property Institute (KIPI) journals which are Industrial Property Journals used as a procedure for protection industrial property rights under the Industrial Property Act, No. of 2001 and the Trade Marks Act, Cap 506 of the Laws of Kenya. The journals are published monthly. Like most governmental or parastatal publications, the KIPI journals are published in PDF format. The journal contains details of all the patents, trademarks and industrial designs applied for the previous month and it includes details specified by World Intellectual Property Organization (WIPO) through its internationally accepted WIPO standards.

The project entails extracting data from the KIPI publications and storing it in tabular structure. Currently the data exists as a free-flowing text. For each patent, utility design, and industrial design each of the following data points should be captured (as shown in Table 1 below). The INID_CODE represents the Internationally agreed Number for Identification of (bibliographic) Data. The codes represent the fields we wish to capture for each data point.


(11) DOCUMENT_NUMBER Number of document
(12) DOCUMENT_TYPE Designation of the kind of document
(15) CORRECTED_INFO_PUI Correction information
(19) PUB_ORG Code of the office or organization publishing the document
(21) APPLICATION_NUMBER_PUI Application number
(22) APPLICATION_DATE_PUI Date of filling of application
(30) PRIORITY_DATA Priority data
(31) PRIORITY_NUMBER Priority number(s)
(32) PRIORITY_DATE Priority Date(s)
(33) PRIORITY_COUNTRY Priority Country(s)
(41) PUB_DATE_UNEXAMINED Publication date of unexamined application
(42) PUB_DATE_EXAMINED Publication date of examined application
(45) REGISTRATION_DATE_PUI Publication date of registered right
(51) INTER_CLASS_PUI International classification
(54) TITLE_INVENTION Title of invention
(56) CITED_DOCS Cited documents
(57) ABSTRACT Abstract
(71) APPLICANT_NAME_PUI Name(s) of application(s)
(72) INVENTOR_NAME Name of inventor(s)
(730) PROPRIETOR_PUI Proprietor
(740) NAME_ADDRESS_AGENT_PUI Name and address of Agent or address of service
(85) DATE_PCT Date of entry into national phase for PCT application
(87) PCT_DATA PCT Publication data


For trademarks, the table below (Table 2) will constitute the data points we wish to capture.


111 REGISTRATION_NUMBER_MARKS Registration number
180 EXPIRY_DATE Expiry date
210 APPLICATION_NUMBER_MARK Application number
220 APPLICATION_DATE_MARK Filling date of application
300 PARIS_CONVENTION Data relating to application under the Paris convention
442 ADVERT_DATE Date of advertisement
511 INTER_CLASS_MARK International classification
526 DISCLAIMER Disclaimer
540 MARK_REPRODUCTION Reproduction of the Mark
566 MARK_TRANSLATION Translation of mark or words contained in the mark
15 CORRECTED_INFO_MARK Ammendment/ Correction information
591 INFO_COLOURS Information concerning colours claimed
730 NAME_ADDRESS_PROPRIETOR_MARK Name and address of the /proprietor
740 NAME_ADDRESS_APPLICANT_MARK Name aand address of the agent or applicant
791 NAME_ADDRESS_LICENSEE_MARK Name and address of Licensee/registered user
  1. All the participants shall use a different PDF documents that shall be provided as part of the project
  2. The completed code and file used shall be uploaded in the participants Github account
  3. All participants must stick to the stipulated timelines for the project

Submission Criteria

Submit Create a Github repo with the NLP code. Include the dataset (PDF) used as input. The code must dump a CSV file with the fields aforementioned above. Indicate the file used for the analysis. The metric to measure will be on the completion rates of the fields.

  1. Successful extraction of the information from the KIPI journals and organization of the same as tabular format in CSV format
  2. Completed code and files used shall be in the participants Github account to be evaluated upon completion or when the set timeline lapses
  3. The metrics for success will be the number of fields successfully populated by the participant’s code
  4. Success shall also be measured on the ability of the code to populate a CSV file with the fields as columns

1st – Ksh. 50,000

2nd – Ksh. 30,000

3rd – Ksh. 20,000

2 Weeks

You have been provided with a KIPI pdf journal that needs an algorithm developed to clean up and populate the data into an excel sheet. See details in the Hackathon Description.

Hackathon Sponsor