This project focuses on a modification of a greedy transition based dependency parser. Typically a Part-Of-Speech (POS) tagger models a probability distribution over all the possible tags for each word in the given sentence and chooses one as its best guess. This is then pass on to the parser which uses this information to build a parse tree. The current state of the art for POS tagging is about 97% word accuracy, which seems high but results in a around 56% sentence accuracy. Small errors at the POS tagging phase can lead to large errors down the NLP pipeline and transition based parsers are particularity sensitive to these types of mistakes. A maximum entropy Markov model was trained as a POS multi-tagger passing more than its 1-best guess to the parser which was thought could make a better decision when committing to a parse for the sentence. This has been shown to give improved accuracy in other parsing approaches. We shown there is a correlation between tagging ambiguity and parsers accuracy and in fact the higher the average tags per word the higher the accuracy.
Linear regression is one of the most widely used statistical methods available today. It is used by data analysts and students in almost every discipline. However, for the standard ordinary least squares method, there are several strong assumptions made about data that is often not true in real world data sets. This can cause numerous problems in the least squares model. One of the most common issues is a model overfitting the data. Ridge Regression and LASSO are two methods used to create a better and more accurate model. I will discuss how overfitting arises in least squares models and the reasoning for using Ridge Regression and LASSO include analysis of real world example data and compare these methods with OLS and each other to further infer the benefits and drawbacks of each method.