Baby Got Hack-NLP Hackathon

Info
Data

Description

Use landscape mode to view all table data from your mobile phone.

This project entails the successful extraction of information from the Kenya Industrial Property Institute (KIPI) journals which are Industrial Property Journals used as a procedure for protection industrial property rights under the Industrial Property Act, No. of 2001 and the Trade Marks Act, Cap 506 of the Laws of Kenya. The journals are published monthly. Like most governmental or parastatal publications, the KIPI journals are published in PDF format. The journal contains details of all the patents, trademarks and industrial designs applied for the previous month and it includes details specified by World Intellectual Property Organization (WIPO) through its internationally accepted WIPO standards.

The project entails extracting data from the KIPI publications and storing it in tabular structure. Currently the data exists as a free-flowing text. For each patent, utility design, and industrial design each of the following data points should be captured (as shown in Table 1 below). The INID_CODE represents the Internationally agreed Number for Identification of (bibliographic) Data. The codes represent the fields we wish to capture for each data point.

INID_CODE	IDENTIFICATION	DESCRIPTION
(11)	DOCUMENT_NUMBER	Number of document
(12)	DOCUMENT_TYPE	Designation of the kind of document
(15)	CORRECTED_INFO_PUI	Correction information
(19)	PUB_ORG	Code of the office or organization publishing the document
(21)	APPLICATION_NUMBER_PUI	Application number
(22)	APPLICATION_DATE_PUI	Date of filling of application
(30)	PRIORITY_DATA	Priority data
(31)	PRIORITY_NUMBER	Priority number(s)
(32)	PRIORITY_DATE	Priority Date(s)
(33)	PRIORITY_COUNTRY	Priority Country(s)
(41)	PUB_DATE_UNEXAMINED	Publication date of unexamined application
(42)	PUB_DATE_EXAMINED	Publication date of examined application
(45)	REGISTRATION_DATE_PUI	Publication date of registered right
(51)	INTER_CLASS_PUI	International classification
(54)	TITLE_INVENTION	Title of invention
(56)	CITED_DOCS	Cited documents
(57)	ABSTRACT	Abstract
(71)	APPLICANT_NAME_PUI	Name(s) of application(s)
(72)	INVENTOR_NAME	Name of inventor(s)
(730)	PROPRIETOR_PUI	Proprietor
(740)	NAME_ADDRESS_AGENT_PUI	Name and address of Agent or address of service
(85)	DATE_PCT	Date of entry into national phase for PCT application
(87)	PCT_DATA	PCT Publication data

For trademarks, the table below (Table 2) will constitute the data points we wish to capture.

INID_CODE	IDENTIFICATION	DESCRIPTION
111	REGISTRATION_NUMBER_MARKS	Registration number
180	EXPIRY_DATE	Expiry date
210	APPLICATION_NUMBER_MARK	Application number
220	APPLICATION_DATE_MARK	Filling date of application
300	PARIS_CONVENTION	Data relating to application under the Paris convention
442	ADVERT_DATE	Date of advertisement
511	INTER_CLASS_MARK	International classification
526	DISCLAIMER	Disclaimer
540	MARK_REPRODUCTION	Reproduction of the Mark
566	MARK_TRANSLATION	Translation of mark or words contained in the mark
15	CORRECTED_INFO_MARK	Ammendment/ Correction information
591	INFO_COLOURS	Information concerning colours claimed
730	NAME_ADDRESS_PROPRIETOR_MARK	Name and address of the /proprietor
740	NAME_ADDRESS_APPLICANT_MARK	Name aand address of the agent or applicant
791	NAME_ADDRESS_LICENSEE_MARK	Name and address of Licensee/registered user

Rules

All the participants shall use a different PDF documents that shall be provided as part of the project
The completed code and file used shall be uploaded in the participants Github account
All participants must stick to the stipulated timelines for the project

Submission Criteria

Submit Create a Github repo with the NLP code. Include the dataset (PDF) used as input. The code must dump a CSV file with the fields aforementioned above. Indicate the file used for the analysis. The metric to measure will be on the completion rates of the fields.

Evaluation

Successful extraction of the information from the KIPI journals and organization of the same as tabular format in CSV format
Completed code and files used shall be in the participants Github account to be evaluated upon completion or when the set timeline lapses
The metrics for success will be the number of fields successfully populated by the participant’s code
Success shall also be measured on the ability of the code to populate a CSV file with the fields as columns

Prizes

1st – Ksh. 50,000

2nd – Ksh. 30,000

3rd – Ksh. 20,000

Timeline

2 Weeks

You have been provided with a KIPI pdf journal that needs an algorithm developed to clean up and populate the data into an excel sheet. See details in the Hackathon Description.

DATA SET 1 (KIPI JOURNALS 2003-2019)

Description

Rules

Evaluation

Prizes

1st – Ksh. 50,000

2nd – Ksh. 30,000

3rd – Ksh. 20,000

Timeline

2 Weeks

Hackathon Sponsor

Thebhub

Legal

Recent Articles

What Our Clients Say

Description

Rules

Evaluation

Prizes

1st – Ksh. 50,000

2nd – Ksh. 30,000

3rd – Ksh. 20,000

Timeline

2 Weeks

Hackathon Sponsor

TheBhub is a Leading B2B Company in Kenya

TheBhub Shows Off Pride in Supporting Startups