This project entails the successful extraction of information from the Kenya Industrial Property Institute (KIPI) journals which are Industrial Property Journals used as a procedure for protection industrial property rights under the Industrial Property Act, No. of 2001 and the Trade Marks Act, Cap 506 of the Laws of Kenya. The journals are published monthly. Like most governmental or parastatal publications, the KIPI journals are published in PDF format. The journal contains details of all the patents, trademarks and industrial designs applied for the previous month and it includes details specified by World Intellectual Property Organization (WIPO) through its internationally accepted WIPO standards.
The project entails extracting data from the KIPI publications and storing it in tabular structure. Currently the data exists as a free-flowing text. For each patent, utility design, and industrial design each of the following data points should be captured (as shown in Table 1 below). The INID_CODE represents the Internationally agreed Number for Identification of (bibliographic) Data. The codes represent the fields we wish to capture for each data point.
INID_CODE | IDENTIFICATION | DESCRIPTION |
(11) | DOCUMENT_NUMBER | Number of document |
(12) | DOCUMENT_TYPE | Designation of the kind of document |
(15) | CORRECTED_INFO_PUI | Correction information |
(19) | PUB_ORG | Code of the office or organization publishing the document |
(21) | APPLICATION_NUMBER_PUI | Application number |
(22) | APPLICATION_DATE_PUI | Date of filling of application |
(30) | PRIORITY_DATA | Priority data |
(31) | PRIORITY_NUMBER | Priority number(s) |
(32) | PRIORITY_DATE | Priority Date(s) |
(33) | PRIORITY_COUNTRY | Priority Country(s) |
(41) | PUB_DATE_UNEXAMINED | Publication date of unexamined application |
(42) | PUB_DATE_EXAMINED | Publication date of examined application |
(45) | REGISTRATION_DATE_PUI | Publication date of registered right |
(51) | INTER_CLASS_PUI | International classification |
(54) | TITLE_INVENTION | Title of invention |
(56) | CITED_DOCS | Cited documents |
(57) | ABSTRACT | Abstract |
(71) | APPLICANT_NAME_PUI | Name(s) of application(s) |
(72) | INVENTOR_NAME | Name of inventor(s) |
(730) | PROPRIETOR_PUI | Proprietor |
(740) | NAME_ADDRESS_AGENT_PUI | Name and address of Agent or address of service |
(85) | DATE_PCT | Date of entry into national phase for PCT application |
(87) | PCT_DATA | PCT Publication data |
For trademarks, the table below (Table 2) will constitute the data points we wish to capture.
INID_CODE | IDENTIFICATION | DESCRIPTION |
111 | REGISTRATION_NUMBER_MARKS | Registration number |
180 | EXPIRY_DATE | Expiry date |
210 | APPLICATION_NUMBER_MARK | Application number |
220 | APPLICATION_DATE_MARK | Filling date of application |
300 | PARIS_CONVENTION | Data relating to application under the Paris convention |
442 | ADVERT_DATE | Date of advertisement |
511 | INTER_CLASS_MARK | International classification |
526 | DISCLAIMER | Disclaimer |
540 | MARK_REPRODUCTION | Reproduction of the Mark |
566 | MARK_TRANSLATION | Translation of mark or words contained in the mark |
15 | CORRECTED_INFO_MARK | Ammendment/ Correction information |
591 | INFO_COLOURS | Information concerning colours claimed |
730 | NAME_ADDRESS_PROPRIETOR_MARK | Name and address of the /proprietor |
740 | NAME_ADDRESS_APPLICANT_MARK | Name aand address of the agent or applicant |
791 | NAME_ADDRESS_LICENSEE_MARK | Name and address of Licensee/registered user |
- All the participants shall use a different PDF documents that shall be provided as part of the project
- The completed code and file used shall be uploaded in the participants Github account
- All participants must stick to the stipulated timelines for the project
Submission Criteria
Submit Create a Github repo with the NLP code. Include the dataset (PDF) used as input. The code must dump a CSV file with the fields aforementioned above. Indicate the file used for the analysis. The metric to measure will be on the completion rates of the fields.
- Successful extraction of the information from the KIPI journals and organization of the same as tabular format in CSV format
- Completed code and files used shall be in the participants Github account to be evaluated upon completion or when the set timeline lapses
- The metrics for success will be the number of fields successfully populated by the participant’s code
- Success shall also be measured on the ability of the code to populate a CSV file with the fields as columns
1st – Ksh. 50,000
2nd – Ksh. 30,000
3rd – Ksh. 20,000
2 Weeks
You have been provided with a KIPI pdf journal that needs an algorithm developed to clean up and populate the data into an excel sheet. See details in the Hackathon Description.