In the essay, I refer to three scientific papers on Argument Mining: “Annotation of argument structure in Japanese legal documents”, “What works and what does not: Classifier and feature analysis for argument mining” and “Manual Identification of Arguments with Implicit Conclusions Using Semantic Rules for Argument Mining”. This essay aims to indicate the rhetorical analysis of three articles on Argument mining research, illustrating the structure and components of each paper and how the components are organized.

Hiroaki Yamada, Simone Teufel and Takenobu Tokunaga, in their essay “Annotation of argument structure in Japanese legal documents” claims that an approach to annotating archives of Japanese civil judgement is proposed, aiming at summarizing these documents flexibly. They develop their proposal through the study of an annotation method which focuses on the relationship among legal problems and argument architecture. Additionally, they adapt rhetorical conditions for the purpose of matching the constitutional system and define a related argument architecture according to constitutional sub-arguments (Yamada, Teufel and Tokunaga 22). They write to show concurrence between 2 annotators on various views of the entire task. They write for those who are studying the scheme of annotating judgement documents automatically with detailed and cogent results based on machine learning experiments.

First of all, they start with an abstract, illustrating the main purpose of the article and main contributions they have made. Secondly they write a paragraph that introduces the background information of automatic summarization, including the main problems researchers are facing currently and the annotation scheme proposed in this paper (Yamada, Teufel and Tokunaga 23). Then the architecture of judgement documents is declared, which are divided into argument structure, issue topics and rhetorical structure.

In this section, figures are used to describe the argument architecture of judgement documents more visually and understandably. After rhetorical architecture determined, two types of connections- “Issue Topic Connection” and “Supporting Connection” are therefore introduced by comparing the purpose with each other. Also, a table reflecting rhetorical annotation method for Japanese constitutional judgement archives is represented to illustrate the corresponding description of each label directly (Yamada, Teufel and Tokunaga 26).

Because of the property of four proposed classification tasks, various agreement criteria are illustrated necessarily with several equations derived from other authoritative articles, consisting of Issue Topic Classification, Issue Topic Connections, FRAMING Connections and Standard for FRAMING Connections. Since all the theoretical and mathematical information for proposing the annotation method has been prepared adequately, an experiment is performed to evaluate the reproducibility of the method, including the four procedures as illustrated above (Yamada, Teufel and Tokunaga 28).

In addition, some recent research on agreement criteria is reviewed to report their interest in arguing with these rhetorical methods, as they are not properly acceptable to the annotation approach proposed in the essay. Ultimately, they report a conclusion that their classification experiment showed high accuracy for the task of labeling rhetorical status, but also declare the future work which aims to improve the annotation material via supervised machine learning algorithms (Yamada, Teufel and Tokunaga 30).

Ahmet Aker, Alfred Sliwa and Yuan Ma etc., in their paper “What works and what does not: Classifier and feature analysis for argument mining” propose a comparative scheme to evaluate the performance of various supervised machine learning algorithms on argument minging experiments. In particular, they undertake the experiments of argumentative fragments extracted from documents and predictions of the architecture among these fragments, evaluating eight classifiers and various mixture of six feature schemes described in past papers as well. Besides, the feature mixture and corpus selected for training and testing results in different performance of classifiers, whereas the classifier with highest performance seems to be Random Forest(Aker, et al. 91). They write to offer a baseline for better argument mining algorithms in the future, directing an application of argument mining into various domains. They write for those who focus on the research in evaluating the classifier and feature for argument mining.

Similar to the essay evaluated above, they provide an introduction to illustrate the background information of argument mining via quoting several recent related essays, and thus readers can be clear about the definition of argument mining and the difference of experiments between the essay and past papers. Next, they describe necessary settings in the proposed experiment, which is comprised of data, features, detecting and predicting argumentative components, and classifiers (Aker, et al. 92). Some of the settings comes from the previous paper, as the algorithms of them are applicable to the current work as well.

Then experiments are implemented based on these settings and the results for persuasive papers, Wikipedia corpus and CNN are individually provided leveraging macro F1-score (Aker, et al. 93). To make the result more intuitional and logical, several tables are illustrated in detail, such as the table representing the F1-scores of various algorithms for classifying various feature category mixture for the Wikipedia data.