Senti-LSSVM: Sentiment-Oriented Multi-Relation Extraction with Latent Structural SVM

Extracting instances of sentiment-oriented relations from user-generated web documents is important for online marketing analysis. Unlike previous work, we formulate this extraction task as a structured prediction problem and design the corresponding inference as an integer linear program. Our latent structural SVM based model can learn from training corpora that do not contain explicit annotations of sentiment-bearing expressions, and it can simultaneously recognize instances of both binary (polarity) and ternary (comparative) relations with regard to entity mentions of interest. The empirical evaluation shows that our approach significantly outperforms state-of-the-art systems across domains (cameras and movies) and across genres (reviews and forum posts). The gold standard corpus that we built will also be a valuable resource for the community.


Introduction
Sentiment-oriented relation extraction (Choi et al., 2006) is concerned with recognizing sentiment polarities and comparative relations between entities from natural language text. Identifying such relations often requires syntactic and semantic analysis at both sentence and phrase level. Most prior work on sentiment analysis consider either i) subjective sentence detection (Yu and Kübler, 2011), ii) polarity classification (Johansson and Moschitti, 2011;, or iii) comparative relation identification (Jindal and Liu, 2006;Ganapathibhotla and Liu, 2008). In practice, however, differ-ent types of sentiment-oriented relations frequently coexist in documents. In particular, we found that more than 38% of the sentences in our test corpus contain more than one type of relations. The isolated analysis approach is inappropriate because i) it sacrifices acuracy by ignoring the intricate interplay among different types of relations; ii) it could lead to conflicting predictions such as estimating a relation candidate as both negative and comparative. Therefore, in this paper, we identify instances of both sentiment polarities and comparative relations for entities of interest simultaneously. We assume that all the mentions of entities and attributes are given, and entities are disambiguated. It is a widely used assumption when evaluating a module in a pipeline system that the outputs of preceding modules are error-free.
To the best of our knowledge, the only existing system capable of extracting both comparisons and sentiment polarities is a rule-based system proposed by Ding et al. (2009). We argue that it is better to tackle the task by using a unified model with structured outputs. It allows us to consider a set of correlated relation instances jointly and characterize their interaction through a set of soft and hard constraints. For example, we can encode constraints to discourage an attribute to participate in a polarity relation and a comparative relation at the same time. As a result, the system extracts a set of correlated instances of sentiment-oriented relations from a given sentence. For example, with the sentence about the camera Canon 7D, "The sensor is great, but the price is higher than Nikon D7000." the expected output is positive(Canon 7D, sensor) and preferred(Nikon D7000, Canon 7D, textitprice).
However, constructing a fully annotated training corpus for this task is labor-intensive and requires strong linguistic background. We minimize this overhead by applying a simplified annotation scheme, in which annotators mark mentions of entities and attributes, disambiguate the entities, and label instances of relations for each sentence. Based on the new scheme, we have created a small Sentiment Relation Graph (SRG) corpus for the domains of cameras and movies, which significantly differs from the corpora used in prior work (Wei and Gulla, 2010;Kessler et al., 2010;Toprak et al., 2010;Hu and Liu, 2004) in the following ways: i) both sentiment polarities and comparative relations are annotated; ii) all mentioned entities are disambiguated; and iii) no subjective expressions are annotated, unless they are part of entity mentions.
The new annotation scheme raises a new challenge for learning algorithms in that they need to automatically find textual evidences for each annotated relation during training. For example, with the sentence "I like the Rebel a little better, but that is another price jump", simply assigning a sentimentbearing expression to the nearest relation candidate is insufficient, especially when the sentiment is not explicitly expressed.
In this paper, we propose SENTI-LSSVM, a latent structural SVM based model for sentiment-oriented relation extraction. SENTI-LSSVM is applied to find the most likely set of the relation instances expressed in a given sentence, where the latent variables are used to assign the most appropriate textual evidences to the respective instances.
In summary, the contributions of this paper are the following: • We propose SENTI-LSSVM: the first unified statistical model with the capability of extracting instances of both binary and ternary sentimentoriented relations.
• We design a task-specific integer linear programming (ILP) formulation for inference.
• We construct a new SRG corpus as a valuable asset for the evaluation of sentiment relation extraction.
• We conduct extensive experiments with online reviews and forum posts, showing that SENTI-LSSVM model can effectively learn from a training corpus without explicitly annotated subjective expressions and that its performance significantly outperforms state-of-the-art systems.

Related Work
There are ample works on analyzing sentiment polarities and entity comparisons, but the majority of them studied the two tasks in isolation. Most prior approaches for fine-grained sentiment analysis focus on polarity classification. Supervised approaches on expression-level analysis require the annotation of sentiment-bearing expressions as training data (Jin et al., 2009;Choi and Cardie, 2010;Johansson and Moschitti, 2011;Yessenalina and Cardie, 2011;Wei and Gulla, 2010). However, the corresponding annotation process is time-consuming. Although sentence-level annotations are easier to obtain, the analysis at this level cannot cope with sentences conveying relations of multiple types (McDonald et al., 2007;Täckström and McDonald, 2011;Socher et al., 2012). Lexiconbased approaches require no training data (Ku et al., 2006;Kim and Hovy, 2006;Godbole et al., 2007;Ding et al., 2008;Popescu and Etzioni, 2005;Liu et al., 2005) but suffer from inferior performance Qu et al., 2012). In contrast, our method requires no annotation of sentiment-bearing expressions for training and can predict both sentiment polarities and comparative relations.
Sentiment-oriented comparative relations have been studied in the context of user-generated discourse (Jindal and Liu, 2006;Ganapathibhotla and Liu, 2008). Approaches rely on linguistically motivated rules and assume the existence of independent keywords in sentences which indicate comparative relations. Therefore, these methods fall short of extracting comparative relations based on domain dependent information.
Both Johansson and Moschitti (2011) and Wu et al. (2011) formulate fine-grained sentiment analysis as a learning problem with structured outputs. However, they focus only on polarity classification of expressions and require annotation of sentimentbearing expressions for training as well.
While ILP has been previously applied for inference in sentiment analysis (Choi and Cardie, 2009;Somasundaran and Wiebe, 2009;Wu et al., 2011), our task requires a complete ILP reformulation due to 1) the absence of annotated sentiment expressions and 2) the constraints imposed by the joint extraction of both sentiment polarity and comparative relations.

System Overview
This section gives an overview of the whole system for extracting sentiment-oriented relation instances. Prior to presenting the system architecture, we introduce the essential concepts and the definitions of two kinds of directed hypergraphs as the representation of correlated relation instances extracted from sentences.

Concepts and Definitions
Entity. An entity is an abstract or concrete thing, which needs not be of material existence. An entity in this paper refers to either a product or a brand.
Attribute. An attribute is an object closely associated with or belonging to an entity, such as the lens of digital camera.

Sentiment-Oriented Relation.
A sentimentoriented relation is either a sentiment polarity or a comparative relation, defined on tuples of entities and attributes. A sentiment polarity relation conveys either a positive or a negative attitude towards entities or their attributes, whereas a comparative relation indicates the preference of one entity over the other entity w.r.t. an attribute. Relation Instance. An instance of sentiment polarity takes the form r(entity, attribute) with r ∈ {positive, negative}, such as positive(Canon 7D, sensor). The polarity instances expressed in the form of unary relations, such as "Nikon D7000 is excellent.", are denoted as binary relations r(entity, whole), where the attribute whole indicates the entity as a whole. In contrast, an instance of comparative relation is in the form of preferred{entity, entity, attribute}, e.g. preferred(Canon 7D, Nikon D7000, price). For brevity, we refer to an instance set of sentiment-oriented relations extracted from a sentence as an sSoR. To represent the instances of the remaining relations, we represent them as other{entity, attribute}, such as textitpartOf{wheel, car}. These relations include objective relations and the subjective relations other than sentimentoriented relations. Mention-Based Relation Instances. A mentionbased relation instance refers to a tuple of entity mentions with a certain relation. This concept is introduced as the representation of instances in a sentence by replacing entities with the corresponding entity mentions, such as positive("Canon SD880i", "wide angle view"). Mention-Based Relation Graph. A mention-based relation graph (or MRG ) represents a collection of mention-based relation instances expressed in a sentence. As illustrated in Figure 1, an MRG is a directed hypergraph G = M, E with a vertex set M and an edge set E. A vertex m i ∈ M denotes a mention of an entity or an attribute occurring either within the sentence or in its context. We say that a mention is from the context if it is mentioned in the previous sentence or is an attribute implied in the current sentence. An instance of a binary relation in an MRG takes the form of a binary edge e l = (m i , m a ), where m i and m a denote an entity mention and an attribute mention respectively, and the type l ∈ {positive, negative, other}. A ternary edge e l indicating comparative relation is represented as e l = (m i , m j , m a ), where two entity mentions m i and m j are compared with respect to the attribute mention m a . We define the type l ∈ {better,worse} to indicate two possible directions of the relation and assume m i occurs before m j . As a result, we have a set L of five relation types: positive, negative, better, worse or other. According to these definitions, the annotations in the SRG corpus are actually MRGs and disambiguated entities. If there are multiple mentions referring to the same entity, annotators are asked to choose the most obvious one because it saves annotation time and is less demanding for the entity recognition and diambiguation modules. Evidentiary Mention-Based Relation Graph. An evidentiary mention-based relation graph, coined eMRG , extends an MRG by associating each edge with a textual evidence to support the corresponding relation assertions (see Figure 2). Consequently, an edge in an eMRG is denoted by a pair (a, c), where a represents a mention-based relation instance and c is the associated textual evidence. It is also referred to as an evidentiary edge. represented as e l = (m i , m j , m a ), an MRG as an evidentiary MRG (eMRG) and the edges of eMRGs as evidentiary edges, as shown in Figure 2. As illustrated by Figure 3, at the core of our system is the SENTI-LSSVM model, which extracts sets of mention-based relationships in the form of eMRGs from sentences. For a given sentence with known entity mentions, we select all possible mention sets as relation candidates, where each set includes at least one entity mention. Then we associate each relation candidate with a set of constituents or the whole sentence as the textual evidence candidates (cf. Section 6.1). Subsequently, the inference component aims to find the most likely eMRG from all possible combinations of mention-based relation instances and their textual evidences (cf. Section 6.2). The representation eMRG is chosen because it characterizes exactly the model outputs by letting each edge correspond to an instance of mention-based relation and the associated textual evidence. Finally, the model parameters of this model are learned by an online algorithm (cf. Section 7).

System Architecture
Since instance sets of sentiment-oriented relations (sSoRs) are the expected outputs, we can obtain sSoRs from MRGs by using a simple rule-based algorithm. The algorithm essentially maps the mentions from an MRG into entities and attributes in an sSoR and label the corresponding tuples with the relation types of the edges from an MRG. For instances of comparative relation, the label better or worse is mapped to the relation type preferred.

SENTI-LSSVM Model
The task of sentiment-oriented relation extraction is to determine the most likely sSoR in a sentence. Since sSoRs are derived from the corresponding MRGs as described in Section 3, the task is reduced to find the most likely MRG for each sentence. Since an MRG is created by assigning relation types to a subset of all relation candidates, which are possible tuples of mentions with unknown relation types, the number of MRGs can be extremely high.
To tackle the task, one solution is to employ an edge-factored linear model in the framework of structural SVM (Martins et al., 2009;Tsochantaridis et al., 2004). The model suggests that a bag of features should be specified for each relation candidate, and then the model predicts the most likely candidate sets along with their relation types to form the optimal MRGs. As we observed, for a relation candidate, the most informative features are the words near its entity mentions in the original text. How-ever, if we represent a candidate by all these words, it is very likely that the instances of different relation types share overly similar features, because a mention is often involved in more than one relation candidate, as shown in Figure 2. As a consequence, the instances of different relations represented by overly similar features can easily confuse the learning algorithm. Thus, it is critical to select proper constituents or sentences as textual evidences for each relation candidate in both training and testing.
Consequently, we divide the task of sentimentoriented relation extraction into two subtasks : i) identifying the most likely MRGs; ii) assigning proper textual evidences to each edge of MRGs to support their relation assertions. It is desirable to carry out the two subtasks jointly as these two subtasks could enhance each other. First, the identification of relation types requires proper textual evidences; second, the soft and hard constraints imposed by the correlated relation instances facilitate the recognition of the corresponding textual evidences. Since the eMRGs are created by attaching every MRG with a set of textual evidences, tackling the two subtasks simultaneously is equivalent to selecting the most likely eMRG from a set of eMRG candidates. It is challenging because our SRG corpus does not contain any annotation of textual evidences.
Formally, let X denote the set of all available sentences, and we define y ∈ Y(x)(x ∈ X ) as the set of labeled edges of an MRG and Y = ∪ x∈X Y(x). Since the assignments of textual evidences are not observed, an assignment of evidences to y is denoted by a latent variable h ∈ H(x) and H = ∪ x∈X H(x). Then (y, h) corresponds to an eMRG, and (a, c) ∈ (y, h) is a labeled edge a attached with a textual evidence c. Given a labeled dataset Due to the introduction of latent variables, we adopt the latent structural SVM (Yu and Joachims, 2009) for structural classification. Our discriminant function is defined as where Φ(x, y, h) is the feature function of an eMRG (y, h) and β is the corresponding weight vector.
To ensure tractability, we also employ edge-based factorization for our model. Let M p denote a set of entity mentions and y r (m i ) be a set of edges labeled with sentiment-oriented relations incident to m i , the factorization of Φ(x, y, h) is given as where Φ e (x, a, c) is a local edge feature function for a labeled edge a attached with a textual evidence c and Φ c (a, a ) is a feature function capturing cooccurrence of two labeled edges a m i and a m i incident to an entity mention m i .

Feature Space
The following features are used in the feature functions (Equation 2): Unigrams: As mentioned before, a textual evidence attached to an edge in MRG is either a word, phrase or sentence. We consider all lemmatized unigrams in the textual evidence as unigram features.
Context: Since web users usually express related sentiments about the same entity across sentence boundaries, we describe the sentiment flow using a set of contextual binary features. For example, if entity A is mentioned in both the previous sentence and the current sentence, a set of contextual binary features are used to indicate all possible combinations of the current and the previous mentioned sentimentoriented relations regarding to entity A.
Co-occurrence: We have mentioned the cooccurrence feature in Equation 2, indicated by Φ c (a, a ). It captures the co-occurrence of two labeled edges incident to the same entity mention. Note that the co-occurrence feature function is considered only if there is a contrast conjunction such as "but" between the non-shared entity mentions incident to the two labeled edges.
Senti-predictors: Following the idea of (Qu et al., 2012), we encode the prediction results from the rule-based phrase-level multi-relation predictor (Ding et al., 2009) and from the bag-of-opinions predictor (Qu et al., 2010) as features based on the textual evidence. The output of the first predictor is an integer value, while the output of the second predictor is a sentiment relation, such as "positive", "negative", "better" or "worse". We map the relational outputs into integer values and then encode the outputs from both predictors as senti-predictor features.
Others: The commonly used part-of-speech tags are also included as features. Moreover, for an edge candidate, a set of binary features are used to denote the types of the edge and its entity mentions. For instance, a binary feature indicates whether an edge is a binary edge related to an entity mentioned in context. To characterize the syntactic dependencies between two adjacent entity mentions, we use the path in the dependency tree between the heads of the corresponding constituents, the number of words and other mentions in-between as features. Additionally, if the textual evidence is a constituent, its feature w.r.t. an edge is the dependency path to the closest mention of the edge that does not overlap with this constituent.

Structural Inference
In order to find the best eMRG for a given sentence with a well trained model, we need to determine the most likely relation type for each relation candidate and support the corresponding assertions with proper textual evidences. We formulate this task as an Integer Linear Programming (ILP). Instead of considering all constituents of a sentence, we empirically select a subset as textual evidences for each relation candidate.

Textual Evidence Candidates Selection
Textual evidences are selected based on the constituent trees of sentences parsed by the Stanford parser (Klein and Manning, 2003). For each mention in a sentence, we first locate a constituent in the tree with the maximal overlap by Jaccard similarity. Starting from this constituent, we consider two types of candidates: type I candidates are constituents at the highest level which contain neither any word of another mention nor any contrast conjunctions such as "but"; type II candidates are constituents at the highest level which cover exactly two mentions of an edge and do not overlap with any other mentions. For a binary edge connecting an entity mention and an attribute mention, we consider a type I candidate starting from the attribute men-tion. For a binary edge connecting two entity mentions, we consider type I candidates starting from both mentions. Moreover, for a comparative ternary edge, we consider both type I and type II candidates starting from the attribute mention. This strategy is based on our observation that these candidates often cover the most important information w.r.t. the covered entity mentions.

ILP Formulation
We formulate the inference problem of finding the best eMRG as an ILP problem due to its convenient integration of both soft and hard constraints.
Given the model parameters β, we reformulate the score of an eMRG in the discriminant function (1) as follows, where s ac = β Φ e (x, a, c) denotes the score of a labeled edge a attached with a textual evidence c, s aa = β Φ c (a, a ) is the edge co-occurrence score, the binary variable z ac indicates the presence or absence of the corresponding edge, and z aa indicates if two edges co-occurr. As not every edge set can form an eMRG, we require that a valid eMRG should satisfy a set of linear constraints, which form our constraint space. Then function (1) is equivalent to where B = 2 S with S = {0, 1}, and η and τ are auxiliary binary variables that help define the constraint space. The above optimization problem takes exactly the form of an ILP because both the constraints and the objective function are linear, and all variables take only integer values.
In the following, we consider two types of constraint space, 1) an eMRG with only binary edges and 2) an eMRG with both binary and ternary edges. eMRG with only Binary Edges: An eMRG has only binary edges if a sentence contains no attribute mention or at most one entity mention. We expect that each edge has only one relation type and is supported by a single textual evidence. To facilitate the formulation of constraints, we introduce η e l to denote the presence or absence of a labeled edge e l , and η ec to indicate if a textual evidence c is assigned to an unlabeled edge e. Then the binary variable for the corresponding evidentiary edge z e l c = η ec ∧ η e l , where the ILP formulation of conjunction can be found in (Martins et al., 2009).
Let C e denote the set of textual evidence candidates of an unlabeled edge e. The constraint of at most one textual evidence per edge is formulated as: Once a textual evidence is assigned to an edge, their relation labels should match and the number of labeled edges must agree with the number of attached textual evidences. Further, we assume that a textual evidence c conveys at most one relation so that an evidence will not be assigned to the relations of different types, which is the main problem for the structural SVM based model. Let η cl indicate that the textual evidence c is labeled by the relation type l. The corresponding constraints are expressed as, l∈Le η e l = c∈Ce η ec ; z e l c ≤ η cl ; l∈L η cl ≤ 1 where L e denotes the set of all possible labels for an unlabeled edge e, and L is the set of all relation types of MRGs (cf. Section 3).
In order to avoid a textual evidence being overly reused by multiple relation candidates, we first penalize the assignment of a textual evidence c to a labeled edge a by associating the corresponding z ac with a fixed negative cost −µ in the objective function. Then the selection of one textual evidence per edge a is encouraged by associating µ to z d c in the objective function, where z d c = e∈Sc η ec and S c is the set of edges that the textual evidence c serves as a candidate. The disjunction z d c is expressed as: This soft constraint not only encourages one textual evidence per edge, but also keeps it eligible for multiple assignments. For any two labeled edge a and a incident to the same entity mention, the edge-to-edge cooccurrence is described by z c a,a = z a ∧ z a . eMRG with both Binary and Ternary Edges: If there are more than one entity mentions and at least one attribute mention in a sentence, an eMRG can potentially have both binary and ternary edges. In this case, we assume that each mention of attributes can participate either in binary relations or in ternary relations. The assumption holds in more than 99.9% of the sentences in our SRG corpus, thus we describe it as a set of hard constraints. Geometrically, the assumption can be visualized as the selection between two alternative structures incident to the same attribute mention, as shown in Figure 4. Note that, in the binary edge structure, we include not only the edges incident to the attribute mention but also the edge between the two entity mentions.
Let S b m i be the set of all possible labeled edges in a binary edge structure of an attribute mention m i . Variable τ b m i = e l ∈S b m i η e l indicates whether the attribute mention is associated with a binary edge structure or not. In the same manner, we use τ t m i = e l ∈S t m i η e l to indicate the association of the an attribute mention m i with an ternary edge structure from the set of all incident ternary edges S t m i . The selection between two alternative structures is formulated as τ b m i + τ t m i = 1. As this influences only the edges incident to an attribute mention, we keep all the constraints introduced in the previous section unchanged except for constraint (3), which is modified as Therefore, we can have either binary edges or ternary edges for an attribute mention.

Learning Model Parameters
Given a set of training sentences D = {(x 1 , y 1 ), . . . , (x n , y n )}, the best weight vector β of the discriminant function (1) is found by solving the following optimization problem: where δ(ĥ,ŷ, y) is a loss function measuring the discrepancies between an eMRG (y,h) with gold standard edge labels y and an eMRG (ŷ,ĥ) with inferred labeled edgesŷ and textual evidencesĥ. Due to the sparse nature of the lexical features, we apply L1 regularizer to the weight vector β, and the degree of sparsity is controlled by the hyperparameter ρ.
Since the L1 norm in the above optimization problem is not differentiable at zero, we apply the online forward-backward splitting (FOBOS) algorithm (Duchi and Singer, 2009). It requires two steps for updating the weight vector β by using a single training sentence x on each iteration t.
The former inference problem is similar to the one we considered in the previous section except the inclusion of the loss function. We incorporate the loss function into the ILP formulation by defining the loss between an MRG (y, h) and a gold standard MRG as the sum of per-edge costs. In our experiments, we consider a positive cost ϕ for each wrongly labeled edge a, so that if an edge a has a different label from the gold standard, we add ϕ to the coefficient s ac of the corresponding variable z ac in the objective function of the ILP formulation.
In addition, since the non-positive weights of edge labels in the initial learning phrase often lead to eMRGs with many unlabeled edges, which harms the learning performance, we fix it by adding a constraint for the minimal number of labeled edges in an eMRG, a∈A c∈Ca where A is the set of all labeled edge candidates and ζ denotes the minimal number of labeled edges. Empirically, the best way to determine ζ is to make it equal to the maximal number of labeled edges in an eMRG with the restriction that a textual evidence can be assigned to at most one edge. By considering all the edge candidates A and all the textual evidence candidates C as two vertex sets in a bipartite graphĜ = V = (A, C), E (with edges in E indicating which textual evidence can be assigned to which edge), ζ corresponds to exactly the size of a maximum matching of the bipartite graph 1 .
To find the optimal eMRG (y,h * ), for the gold label k of each edge, we consider the following set of constraints for inference since the labels of the edges are known for the training data, We include also the soft constraints, which avoid a textual evidence being overly reused by multiple relations, and the constraints similar to (5) to ensure a minimal number of labeled edges and a minimal number of sentiment-oriented relations.

SRG Corpus
For evaluation we constructed the SRG corpus, which in total consists of 1686 manually annotated online reviews and forum posts in the digital camera and movie domains 2 . For each domain, we maintain a set of attributes and a list of entity names.
The annotation scheme for the sentiment representation asserts minimal linguistic knowledge from our annotators. By focusing on the meanings of the sentences, the annotators make decisions based on their language intuition, not restricted by specific syntactic structures. Taking the example in Figure  2, the annotators only need to mark the mentions of entities and attributes from both the sentences and the context, disambiguate them, and label ("Canon 7D", "Nikon D7000", price) as worse and ("Canon 7D", "sensor") as positive, whereas in prior work, people have annotated the sentiment-bearing expressions such as "great" and link them to the respective relation instances as well. This also enables them to annotate instances of both sentiment polarity and comparative relaton, which are conveyed by not only explicit sentiment-bearing expressions like "excellent performance", but also factual expressions implying evaluations such as "The 7V has 10x optical zoom and the 9V has 16x.".

Camera Movie
Reviews Forums Reviews Forums  14 annotators participated in the annotation project. After a short training period, annotators worked on randomly assigned documents one at a time. For product reviews, the system lists all relevant information about the entity and the predefined attributes. For forum posts, the system shows only the attribute list. For each sentence in a document, the annotator first determines if it refers to an entity of interest. If not, the sentence is marked as off-topic. Otherwise, the annotator will identify the most obvious mentions, disambiguate them, and mark the MRGs. We evaluate the inter-annotator agreement on sSoRs in terms of Cohen's Kappa (κ) (Cohen, 1968). An average Kappa value of 0.698 was achieved on a randomly selected set consisting of 412 sentences. Table 1 shows the corpus distribution after normalizing them into sSoRs. Camera forum posts contain the largest proportion of comparisons because they are mainly about the recommendation of digital cameras. In contrast, web users are much less interested in comparing movies, in both reviews and forums. In all subsets, positive relations play a dominant role since web users intend to express more positive attitudes online than negative ones (Pang and Lee, 2007).

Experiments
This section describes the empirical evaluation of SENTI-LSSVM together with two competitive baselines on the SRG corpus.

Experimental Setup
We implemented a rule-based baseline (DING-RULE) and a structural SVM (Tsochantaridis et al., 2004) baseline (SENTI-SSVM) for comparison. The former system extends the work of Ding et al. (2009), which designed several linguisticallymotivated rules based on a sentiment polarity lexicon for relation identification and assumes there is only one type of sentiment relation in a sentence. In our implementation, we keep all the rules of (Ding et al., 2009) and add one phrase-level rule when there are more than one mention in a sentence. The additional rule assigns sentiment-bearing words and negators to its nearest relation candidates based on the absolute surface distance between the words and the corresponding mentions. In this case, the phraselevel sentiment-oriented relations depend only on the assigned sentiment words and negators. The latter system is based on a structural SVM and does not consider the assignment of textual evidences to relation instances during inference. The textual features of a relation candidate are all lexical and sentiment predictor features within a surface distance of four words from the mentions of the candidate.
Thus, this baseline does not need the inference constraints of SENTI-LSSVM for the selection of textual evidences. To gain more insights into the model, we also evaluate the contribution of individual features of SENTI-LSSVM. In addition, to show if identifying sentiment polarities and comparative relations jointly works better than tackling each task on its own, we train SENTI-LSSVM for each task separately and combine their predictions according to compatibility rules and the corresponding graph scores.
For each domain and text genre, we withheld 15% documents for development and use the remaining for cross validation. The hyperparameters of all systems are tuned on the development datasets. For all experiments of SENTI-LSSVM, we use ρ = 0.0001 for the L1 regularizer in Eq.(4) and ϕ = 0.05 for the loss function; and for SENTI-SSVM, ρ = 0.0001 and ϕ = 0.01. Since the relation type of off-topic sentences is certainly other, we evaluate all systems with 5-fold cross-validation only on the on-topic sentences in the evaluation dataset. Since the same sSoR can have several equivalent MRGs and the relation type other is not of our interest, we evaluate the sSoRs in terms of precision, recall and F-measure. All reported numbers are averages over the 5 folds. Table 2 shows the complete results of all systems. Here our model SENTI-LSSVM outperformed all baselines in terms of the average F-measure scores and recalls by a large margin. The F-measure on movie reviews is about 14% over the best baseline. The rule-based system has higher precision than recall in most cases. However, simply increasing the coverage of the domain independent sentiment polarity lexicon might lead to worse performance (Taboada et al., 2011) because many sentiment oriented relations are conveyed by domain dependent expressions and factual expressions implying evaluations, such as "This camera does not have manual control." Compared to DING-RULE, SENTI-SSVM performs better in the camera domain but worse for the movies due to many misclassification of negative relation instances as other. It also wrongly predicted more positive instances as other than SENTI-LSSVM. We found that the recalls of these instances are low because they often have overly similar features with the instances of the type other linking to the same mentions. The problem gets worse in the movie domain since i) many sentences contain no explicit sentiment-bearing words; ii) the prior polarity of the sentiment-bearing words do not agree with their contextual polarity in the sentences. Consider the following example from a forum post about the movie "Superman Returns": "Have a look at Superman: the Animated Series or Justice League Unlimited . . . that is how the characters of Superman and Lex Luthor should be.". In contrast, our model minimizes the overlapping features by assigning them to the most likely relation candidates. This leads to significantly better performance. Although SENTI-SSVM has low recall for both positive and negative relations, it achieves the highest recall for the comparative relation among all systems in the movie domain and camera reviews. Since less than 1% of all instances are for comparative relations in these document sets and all models are trained to optimize the overall accuracy, SENTI-LSSVM intends to trade off the minority class for the overall better performance. This advantage disappears on the camera forum posts, where the number of instances of comparative relation is 12 times more than that in the other data sets.

Results
All systems perform better in predicting positive relations than the negative ones. This corresponds well to the empirical findings in (Wilson, 2008) that people intend to use more complex expressions for negative sentiments than their affirmative counterparts. It is also in accordance with the distribution of these relations in our SRG corpus which is randomly sampled from the online documents. For learning systems, it can also be explained by the fact that the training data for positive relations are considerably more than those for negative ones. The comparative relation is the hardest one to process since we found that many corresponding expressions do not contain explicit keywords for comparison.
To understand the performance of the key feature groups in our model better, we remove each group from the full SENTI-LSSVM system and evaluate the variations with movie reviews and camera forum posts, which have relatively balanced distribution of relation types. As shown in Table 3, the features from the sentiment predictors make significant contributions for both datasets. The different drops of the performance indicate that the po-   Table 3: Micro-average F-measure of SENTI-LSSVM with different feature models larities predicted by rules are more consistent in camera forum posts than in movie reviews. Due to the complexity of expressions in the movie reviews our model cannot benefit from the unigram features but these features are a good compensation for the sentiment predictor features in camera forum posts. The sharp drop by removing the context features from our model on movie reviews indicates that the sentiments in movie reviews depend highly on the relations of the previous sentences. In contrast, the sentiment-oriented relations of the previous sentences could be a reason of overfitting for camera forum data. The edge co-occurrence features do not play an important role in our model since the number of co-occurred sentiment-oriented relations in the sentences with contrast conjunctions like "but" is small. However, we found that allowing the co-occurrence of any sentiment-oriented relations would harm the performance of the model. In addition, our experiments showed that the sep-arated approach, which trains a model for sentiment polarities and comparative relations respectively, leads to a decrease by almost 1% in terms of the F-measure averaged over all four datasets. The largest drop of F-measure is 3% on camera forum posts, since this dataset contains the largest proportion of comparative relations. We found that the errors are increased when the trained models make conflicting predictions. In this case, the joint approach can take all factors into account and make more consistent decisions than the separated approaches.

Conclusion
We proposed SENTI-LSSVM model for extracting instances of both sentiment polarities and comparative relations. For evaluating and training the model, we created an SRG corpus by using a lightweight annotation scheme. We showed that our model can automatically find textual evidences to support its relation predictions and achieves significantly better F-measure scores than alternative state-of-the-art methods.