UD

ITU NLP Wiki sitesinden
Şuraya atla: kullan, ara

This page will detail the conversion of our treebanks to to the UD framework.

Syntax Validation Issues

Page on paper to be written

Maintenance

Türkçe sayfa için tıklayın

Parts of Speech

Coarse Parts of Speech

ITU UD Notes
Adj ADJ "Var" and "yok" will have their POS tag changed to ADJ.
Adverb ADV -
Det DET -
Dup X Has Echo=Rdp feature
Conj CONJ -
- NUM If the token has the fine POS tag "Num".
Interj INTJ -
Noun NOUN -
Postp ADP -
Pron PRON -
Punc PUNCT -
Verb VERB -

Fine Parts of Speech

These will remain, as CoNLL-U allows language-specific FPOS entries, however we will extract whatever morphological information we can from these tags.

Participles

When modifying a nominal, these are connected with the relation "acl", and do NOT create new IGs.

NOTE: Participles that have possession and a case dependent should have VerbForm=Ger instead of Part.

ITU UD Notes
PresPart Tense=Pres|VerbForm=Part -
AorPart Tense=Aor|VerbForm=Part These are overwhelmingly wrong annotations in our treebanks.
FutPart Tense=Fut|VerbForm=Part Should probably verify
NarrPart Aspect=Perf|Tense=Past|VerbForm=Part Used very often with light verbs. Evidentiality=Nfh or not?
PastPart Tense=Past|VerbForm=Part -

Transgressives

  • NOT partitioned!
  • Lemmas very odd in GK-Treebank, discuss.
  • I would like to add more information about tense with all these, they are just marked as transgressive.
  • Use conj when subjects must be the same, advcl otherwise.
ITU UD Notes
AfterDoingSo Features: VerbForm=Trans Connected with conj relation.
SinceDoingSo Features: VerbForm=Trans Connected with advcl relation.
When Features: VerbForm=Trans Connected with advcl relation.
WithoutHavingDoneSo Features: VerbForm=Trans|Negative=Neg Connected with advcl relation.
WithoutBeingAbleToHaveDoneSo Features: VerbForm=Trans|Mood=Abil|Negative=Neg Not in gk-treebank, suggestion. Relation is advcl.
While Features: VerbForm=Trans Connected with advcl relation.
ByDoingSo Features: VerbForm=Trans conj
AsIf Features: VerbForm=Trans -
Adamantly Features: VerbForm=Trans -
AsLongAs Features: VerbForm=Trans -

Infinitive

  • Inf1
  • Inf2
  • Inf3

All marked with just VerbForm=Ger for now.

Adj. from Noun

More misc

ITU Surface Form Notes
With li separate. relatıon: case
Without siz separate. relatıon: case
Acquire lan (mak) merge/lexıcalıse.
Become leş (mek) merge/lexıcalıse.
InBetween (uluslar)-arası merge/lexıcalıse.
Agt -ci merge/lexıcalıse
Dim -cik merge/lexıcalıse
Reflex - kendi
Rel -ki segmented, relation case.
Related -sel just lexıcalıse



Misc

ITU UD Notes
JustLike Only 3 examples in IMST. Just lexicalize them. -
Adverb:Ly ADV:Ly -
Noun:NAdj ADJ:NAdj -
Postp:Neg VERB:Neg, Negative=Neg, cop -
Postp:Ques, dependent with the ARGUMENT relation AUX:Ques, no feature, aux:q -
Postp:Ques, has depRel=CONJUNCTION CONJ, no feature, CONJUNCTION/cc -
Postp ADP -
Noun:Ness NOUN:Ness -
Pron:Pers PronType=Prs -
Pron:Reflex Reflex=Yes -
Pron:Quant PronType=Ind -
Pron:Demons PronType=Dem -
Pron:Ques PRON:Ques -
Adverb:Since Segment, cpostag=ADP, depRel=case -
Adj:Num, Noun:Num NUM:ANum,NNum -
Noun:Abr These will all be mapped to NOUN:Abr for now, but we may need to work out the cpostag word by word later. -
Hashtag, Email, URL etc. Change POS to SYM. If they have been used as ordinary constituents, keep the head and depRel. Else, head=clausal head, or sentential head if too difficult, depRel=discourse. -

POS Change

ITU UD Notes
Email Change POS to SYM -
Smiley Change POS to SYM -
URL Change POS to SYM -
URL Change POS to SYM -
Keyword Change POS to SYM -
Mention Change POS to SYM -
Prop Change POS to PROPN -

PCs

We will not convert the following:

  • PCAbl
  • PCAcc
  • PCDat
  • PCGen
  • PCIns
  • PCNom

Morphological Features

This section covers mostly inflectional morphology, as derivational morphological tags are usually fine parts of speech in our annotation scheme.

Personal and Possessor Agreement

ITU UD Notes
A1sg Number=Sing|Person=1 -
A2sg Number=Sing|Person=2 -
A3sg Number=Sing|Person=3 -
A1pl Number=Plur|Person=1 -
A2pl Number=Plur|Person=2 -
A3pl Number=Plur|Person=3 -
P1sg Number[psor]=Sing|Person[psor]=1 -
P2sg Number[psor]=Sing|Person[psor]=2 -
P3sg Number[psor]=Sing|Person[psor]=3 -
P1pl Number[psor]=Plur|Person[psor]=1 -
P2pl Number[psor]=Plur|Person[psor]=2 -
P3pl Number[psor]=Plur|Person[psor]=3 -
Pnon - Not used in this conversion, strike out.

Case and NumType

ITU UD Notes
Acc Case=Acc -
Dat Case=Dat -
Gen Case=Gen -
Loc Case=Loc -
Ins Case=Ins -
Nom Case=Nom -
Abl Case=Abl
Dist NumType=Dist -
Ord NumType=Ord -

Tense, Aspect, Modality

  • Initialize certain value for each if none. Default tense is Tense=Pres. Mood=Indicative. Aspect=Perf(Imp for Aorist).
  • Concatenate conflicting aspects or moods with hyphen. Alphabetically ordered.
  • for later: think of EverSince
ITU UD Notes
Narr|Past Tense=Pqp Run this before Narr and Past.
Past|Past Tense=Pqp|Register=Inf Run this before Narr and Past.
Narr|Narr Tense=Pqp|Evidentiality=Nfh Run this before Narr and Past.
Past Tense=Past -
Pres Tense=Pres Adj (only when the coarse postag is Verb)
Fut Tense=Fut -
Narr Tense=Past|Evidentiality=Nfh -
Aor Tense=Aor -
Prog1 Aspect=Prog|Register=Inform If no other tense tag in our annotation, Tense=Pres
Prog2 Aspect=Prog|Register=Form If no other tense tag in our annotation, Tense=Pres
Hastily Aspect=Rapid
Stay Aspect=Dur-Perf
Repeat Aspect=Dur -
Cop Mood=Gen
Imp Mood=Imp OR Mood=Prs Check surface form suffixes (-sene,-senize)
Cond Mood=Cnd -
Opt Mood=Opt -
Desr Mood=Des -
Neces Mood=Nec -
Able Mood=Abil -
Almost Mood=Pro -

Negative

ITU UD Notes
Pos Negative=Pos -
Neg Negative=Neg -

Voice

ITU UD Notes
Caus|Pass Voice=Cau-Pass -
Pass Voice=Pass -
Caus or Caus|Caus Voice=Cau -

Miscellaneous

ITU UD Notes
Abr - Change POS to PROPN
Equ Case=Equ Rethink long term
Fitfor Keep segmented, adposition, case relation Also derivational. Adjectives from nominals.
Prop - Change POS to PROPN

There aren't any examples of the durative aspect alone, e.g. yürü-yedur in our treebanks, we should check analysis/generation if that should be something to cover.

Syntax

One of the most obvious differences between our framework and the of UD, in terms of syntax, is the segmentation of clitics.

Direct Mappings

ITU UD Notes
PREDICATE root for copulae: dependent: function word --> content word"
PUNCTUATION punct (discourse for smileys) head: clausal head, sentential head if stuck
DETERMINER det -
COORDINATION conj head: should be the first conjunct for all other conjuncts and coordinating conjunctions / punctuation
INTENSIFIER advmod:emph -

MWE

head-final, don't reorder. we will not be defeated!

ITU UD Notes
MWE compound number compounds, bare noun compounds e.g. yün çorap
MWE compound:lvc Light verb constructions,
MWE name Proper name compounds
MWE compound:redup Between duplications
MWE syntactic type Everything else

Apposition

ITU UD Notes
APPOSITION appos between NPs
APPOSITION parataxis between VPs

Conjunction, Relativizer

ITU UD Notes
CONJUNCTION cc head: previous conjunct --> first conjunct
CONJUNCTION discourse dependent: ise
CONJUNCTION mark
  • <phrase1> that <phrase2>
  • head: becomes the head of the dependent clause" (read: second/final (for (iyi etc.) ki only?))
RELATIVIZER ccomp
  • iyi ki, demek ki, tabii ki, ...
  • reverse the head clause and the dependent clause
RELATIVIZER acl
  • ordinary usage of the relation
  • reverse the head clause and the dependent clause
POSSESSOR nmod:poss
  • Head: N.P3sg
  • Dependant N.PNON
POSSESSOR nmod
  • Head: N.ABL
  • Dep: N.PNON
POSSESSOR compound
  • Head: N.PNON
  • Dependent: N.NOM
ARGUMENT case Dependent: postpositional argument

Note: This will require reordering too, with postpositions becoming dependent and their arguments becoming head.

ARGUMENT cop:neg Dependent: degil
ARGUMENT aux:q Dependent: mı/mi/mu/mü (only copular)
ARGUMENT cop/aux
  • Dependent: idi/imiş/ise(inflected!)/iken
  • fix iken! it marks cop
  • Dependent: mı/mi/mu/mü (only verbal)

Note: AFAIK Çağrı marks "ise" as discourse

Modifiers

ITU UD Notes
MODIFIER acl VerbForm=Part
MODIFIER amod (dep=ADJ) (head=NOUN)

Dependent: Adj CPOS or NAdj FPOS

Also: Num-Ord or Num-Dist but can also be advmod by reduplication (or other methods?), look into it later

catch-all

MODIFIER advcl VerbForm=Trans or ( VerbForm=Ger and dependent with depRel = case, e.g. bulmak için )
MODIFIER advmod "(dep=ADV) (head)

Dependent: Adverb CPOS

catch-all"

MODIFIER det "Interrogative adjectives

(indefinites as well, but they're supposed to have the DepRel DETERMINER anyway, so maybe we'll automatically correct them in the original treebank later)"

MODIFIER nummod Num-Card
MODIFIER nmod "Nominals in Dat, Loc, Abl

(might split iobj later) Catch-all case"

Objects

ITU UD Notes
OBJECT ccomp VerbForm=Ger OR has a dependent with the relation cop
OBJECT dobj (dep=OBJECT) (head) catch-all

Subjects

ITU UD Notes
Subject csubj VerbForm=Ger OR has a dependent with the relation cop
Subject nsubj "(dep=SUBJECT) (head) catch-all"

Vocatives

ITU UD Notes
Vocative  ??? "Hashtags, Emails, URL etc?

Head=sentence root --> head of phrase (not necessarily sentence!)"

Vocative discourse "CPOS=Interj

Head=sentence root --> head of phrase (not necessarily sentence!)"

Vocative vocative "CPOS=Noun,Pron

Head=sentence root --> head of phrase (not necessarily sentence!)

catch-all"