The data we used were extracted from Chinese Treebank. We post the files that include the label and position information for each comma. The file consists of four columns, the first column is the label extracted from CTB (gold standard), “+1” means clauses connected by current comma are independent and the sentence could be separated at this point; while “-1” means the sentence could not be separated at current comma. The second column is the file name. The third column is the sentence index (starts from 0). The last column is the comma index in a sentence (starts from 0.)


Training Data

Dev Data

Test Data