Machine Learning Competition 2019

Machine Learning Problem Details - Solving the Anaphora Challenge in Linguistics

What is Anaphora in Linguistics?

In linguistics, anaphora is the use of an expression whose interpretation depends upon another expression in context (its antecedent or postcedent). In a narrower sense, anaphora is the use of an expression that depends specifically upon an antecedent expression and thus is contrasted with cataphora, which is the use of an expression that depends upon a postcedent expression. The anaphoric (referring) term is called an anaphor.

In particular, for this competition, we want to associate pronouns with their antecedents (or postcedents, as the case may be). In programming terms, pronouns are like pointers that point to concepts that are defined explicitly by names elsewhere in the text. For example, in the sentence “Mary gave Shele a contact, and she followed up on it,” the word “she” is a pronoun (an anaphor) and “Shele” is its antecedent. This mapping from “she” to “Shele” is obvious enough to someone reading or hearing the sentence, but to a linguistic model, this association is not at all obvious.

Problem and Technical Details

For our Machine Learning Competition problem, we want a model that can recover these mappings from pronoun to antecedent/postcedent with the highest possible F1 score. This corpus should be leveraged for the problem. Reading through the provided research paper and additional details for the corpus will help in understanding how it can be leveraged for the anaphora challenge. Please make sure, per the guidance of the researchers for the reference corpus, that this research paper is cited as part of your solution.

We will use a randomization process to select subsets of this corpus for training the model you provide and for testing it on subsets as well, for seeing how well the model is able to generalize and avoid overfitting. Python will be the required programming language for this problem, and we will provide a basic interface in January that will define the basic methods for training and prediction, which will be leveraged by our test harness as part of the solution judging process.

While the anaphora problem is a challenging one, the following open source technology and algorithmic approaches could provide some acceleration and solution benefit:

Natural Language Processing (NLP) technology, such as spaCy or equivalents, could provide benefits with an approach of algorithmically processing the dependency parse trees of processed sentences
Keras and/or scikit-learn could be leveraged for a potential sequence modeling approach, for leverage of recurrent neural networks (RNN) as a potential within the solution
The Gensim library, plus n-gram or tf-idf approaches could additionally provide capabilities as part of the solution in the form of topic modeling, vectorization, and more, to enable further feature engineering

The best solution may find that leveraging a combination of the above will result in the highest solution success. Any external technology or libraries leveraged must be open source. As a reminder per the Official Rules, the ability for the solution to execute against test subsets of the corpus is weighted at 40% of the solution scoring - the remainder of the 60% scoring is focused on elegance in design, completeness of code comments and documentation, and thoroughness of overall solution documentation diagrams.

Good luck!

What is The Machine Learning Competition?
The competition will focus on a solution for anaphora in linguistics. Anaphora represents a challenge for natural language processing (NLP) and named entity recognition (NER) concepts in computational linguistics, and since Kingland’s Cognitive/AI solutions are centered around successful usage of NLP and NER within the field of Text Analytics, having a solution for anaphora is important.

Entrants to the competition are encouraged to use open source NLP and NER technology paired with open source and/or academia-based corpus sets to help with language processing and model training. It is expected that good design and custom coding beyond the use of this open source technology will be required to provide a solution. The English language will be the focus for the competition.

Who can participate?
Open to any Iowa State University undergrad and grad student.

How is the contest conducted?
Online – you will get all information sent to you and you can submit your answer via email. File should be 10 MB or less.

Can I participate with a team?
Yes, students are encouraged to team up to solve the anaphora challenge. Teams of 2-4 students are likely optimal, but you can also participate as an individual.

Can I submit a project I’ve worked on in the past?
Out of the fairness to other participants, all submissions must have no work dedicated to them before the contest begins.

How much does it cost to participate?
The Machine Learning Competition is free.

Whom can I expect the communication about the event to come from?
All communication will come from our Recruitment & Talent Coordinator, Shele Blum. Please reach out to her with any questions. Shele.blum@kingland.com

I can’t participate anymore, how do I let you know?
Please reach out to Shele Blum if you can no longer participate.

When are the winners announced and what do they receive?
As a result, the team or individuals with the best solution will win. There will be $4,000 in cash prizes. Kingland’s Cognitive/AI engineers will judge the solutions based on elegance of design and their effectiveness in resolving the anaphora challenge on a number of different text/documentation examples. In an absence of a complete or working solution, judging will be based on percentage of solution completion and elegance in design.

I have a friend that wants to participate, how do they sign up?
You can send them to this page to fill out the form. The deadline to register is December 15.

Important dates to keep in mind
Deadline to sign up: December 15, 2018
Receive additional problem details: December 21, 2018
Submit solutions: February 15, 2019
Winner announced: TBD, likely in March 2019

Industries

Industry Examples

Suites

Platform

Company

Careers

Insights

Industries

Industry Examples

Suites

Platform

Company

Careers

Insights

Machine Learning
Competition 2019

Cash Prizes

Schedule

Teams or Individual

Participants

Step 1: Problem

Step 2: The Solution Concept

Step 3: Winners

Machine Learning Problem Details - Solving the Anaphora Challenge in Linguistics

What is Anaphora in Linguistics?

Problem and Technical Details

Frequently Asked Questions

Keep checking back for updated information.

Contact Kingland

Sign up for our newsletter.

Industries

Industry Examples

Suites

Platform

Company

Careers

Insights

Industries

Industry Examples

Suites

Platform

Company

Careers

Insights

Machine LearningCompetition 2019

Cash Prizes

Schedule

Teams or Individual

Participants

Step 1: Problem

Step 2: The Solution Concept

Step 3: Winners

Machine Learning Problem Details - Solving the Anaphora Challenge in Linguistics

What is Anaphora in Linguistics?

Problem and Technical Details

Frequently Asked Questions

Keep checking back for updated information.

Contact Kingland

Sign up for our newsletter.

Machine Learning
Competition 2019