Fascinated by many aspects of Linguistics, I have explored a number of areas for potential research projects and have plans to return to some of these ideas in the future, as well as of course exploring others.
There are, on the surface, three different clause types in Arabic: VSO, SVO and Topic-Comment (TC). The first two are typically considered variants of a basic verbal clause structure (e.g., one is generated from the other by movement), while the third is a verbless clause (which would be translated with a form of to be in English), which has sometimes been considered a subtype of verbal clause with a phonologically null verb.
My analysis (inspired, in fact, by the traditional Arab grammarians) challenges this position. First, I demonstrate that there is no verb in verbless sentences. Second, I argue that despite certain similarities, the VSO and SVO orders are not derivationally related; instead, SVO is actually a special subtype of verbless TC clause, where the subject is the topic and the predicate, a full clause, is the comment. This is supported by distributional evidence of the three surface clause types and has significant implications for the syntactic analysis of Arabic including suggested explanations for otherwise anomalous, and apparently unrelated, phenomena like “multiple subject” constructions and “resumptive” pronouns.
See my November 2012 ISSL presentation.
Early linguistic attitudes were racist and Eurocentric, with the underlying assumption that variation from the norm of Latin and Greek must indicate that a language, along with the culture that created it, was primitive. Following this, it became politically correct and norm in Linguistics to assume all languages are equally complex – if one language appears simpler than another in one way, the system balances out somewhere else. This has been accepted as an axiom, with no specific evidence, for about fifty years, becoming a sort of linguistic myth. Recently, however, this has been questioned – not returning to the socially problematic ideas primitiveness, but now approaching the question of linguistic complexity scientifically in the study of variation in grammatical systems. In this, there are many varied proposed ways of measuring complexity. For discussion of these approaches, see the webpage for the workshop on Formal Linguistics and the Measurement of Grammatical Complexity and references mentioned there, as well as the forthcoming book from Oxford University Press (in press) from the conference participants.
My research on Linguistic Complexity overlaps with my research on Pseudocoordination. I believe that, regardless of the specific methodology for comparing languages with regard to complexity, an exhaustive description of the languages in question is a prerequisite: a surface-level comparison of the most salient features is extremely likely to miss other less salient features that are no less relevant in measuring complexity, given that native speakers have knowledge of these properties too. See Ross (2014).
While I do not specifically consider myself to be a computational linguist, I do find computational implementation to yield interesting results and to be a relevant way to test linguistic theory. My current interest in Computational Linguistics is the possibility of implementing structurally-oriented Semantic parsing visually. The theory being developed is called Object-Oriented Semantics, and it is relational rather than truth-conditional. Eventually, I intend to tie this in with the work on my dissertation about pseudocoordination and the architecture of grammar.
The grammatical theory and computational approach both focus on representing linguistically encoded ideas as objects. From a grammatical perspective, syntactic/semantic trees are not merely concatenations of linguistic form-meaning pairs, but structured objects: one can refer to Noun Phrases or modify a Verb Phrase with an adverbial, and this is what our minds must do in parsing linguistic input, building up worlds of conceptual representation from utterances. (Looking ahead to how such a theory would interact with discourse is also encouraging, with convenient parallels to theories like File-Change Semantics and Discourse Representation Theory.) From a programming perspective, it is useful to conceptualize of grammar as something to be based in object-oriented programming rather than the older approaches which heavily influenced the development of modern linguistic theory. Parsing is not a sequence of top-down or bottom-up operations, but rather a set of modifications applied to individual objects in the representation.
For more information, see the presentation from my November 2013 talk with Ed Schade.
Semantics of Tense
Tense is traditionally considered an operator in syntactic theory (e.g., Prior 1967; Reichenbach 1947), although more recent explanations have proposed that tense is anaphoric (Partee 1973), parallel to pronouns. Departing from this, I argue that tense could also be considered a type of agreement (to times), parallel to gender agreement in nominals.
Due to certain difficulties with this argument and the potential for fundamental equivalence with anaphor-based theories, this is a topic I will return to in the future.
One underemphasized aspect of reconstruction is the need to determine subgrouping within a language family. It is implausible to propose that a proto-language split in multiple directions at once: for example, the Indo-Europeans most likely did not one day invent the wheel and decide to go in separate directions, splitting into the Anatolian, Germanic, Slavic, Italic, Indic, Iranian, Hellenic and other daughters immediately. Rather, the current diversity represents many different migrations into successively smaller subgroupings. While this fact is not generally disputed, it is not typically emphasized in work on reconstruction and historical comparison. This is, however, crucial, in that the sequence of divisions in the development of the daughter languages entirely changes the statistical assumptions. As a simple illustration, consider the “majority” heuristic in reconstruction: if a form appears in the majority of the daughter languages, then it probably is the form that was found in the proto-language. Given the potential variability in subgrouping distributions, there is no statistical reason whatsoever to assume that this heuristic is reliable. (The solution, of course, involves simultaneously considering and revising hypothesized reconstructions, subgroupings and other factors.)
From a methodological perspective, I am interested in bringing together various approaches and the best data to enhance reliability in historical analysis. Current research in Historical Linguistics often focuses on complex computational models to determine how languages are related within a family and to establish the time depth at which the languages split. However, these techniques tend to rely on cognate statistics, which, though easy to obtain, are unreliable. By instead using these techniques with data consistent with the comparative method, I believe we can obtain much more accurate representations. Moreover, the field would benefit from the powerful statistical methods applied in conjunction with linguistically sound assumptions and empirical data, as well as the combination of evidence available from various fields including Linguistics, Anthropology, Biology and Computer Science.
So far this has been more of an interest than active research agenda, although I have written about it for several term papers.