Tuesday, September 29, 2009

NLP and Gender

The Cardsharps, c. 1594, by Michelangelo Meris...Image via Wikipedia

There are a number of papers written on the concept of using Natural Language Processing (NLP) of written text for determining gender. One of the better known papers is by Moshe Koppel and Shlomo Argamon "Gender, Genre, and Writing Style in Formal Written Texts". These works presume that the choice of words specific to gender. In their paper they site findings that women are far more likely than men to use personal pronouns ("I", "you", "she", etc), whereas men prefer words that identify or determine nouns ("a", "the", "that") or that quantify them ("one", "two", "more"). According to Moshe Koppel, one of the authors of the project, this is because women are more comfortable thinking about people and relationships, whereas men prefer thinking about things. (For speech there are all sorts of additional ques that can be picked up on, such pitch, intonation etc.)
You can test this idea at the Gender Genie at: http://bookblog.net/gender/genie.php
The concept of gender can be extended to Alpha-male type behavior, to submissive roles.
They can be extended to other types of roles, such as leader or follower, teacher or student.

Assuming that there are linguistic correlation between such roles, then there should also exist the possibilities of other linguistic correlations such as veracity (truthfulness), strength in negations. A lie detector, for example measures the correlation between biological response to veracity. A gamblers ability to read (detect and correlate) an opponents behavioral changes of a bluff can have a dramatic effect on the outcome of a poker game.

There is a difference between causal and correlation that should when evaluating the usefulness of such tools. For example, a sociopath on a lie detector. The measure of the correlation is not a measure of causal relationship. The causes underlying the correlation, if any, may be indirect and unknown, and high correlations also overlap with identity relations, where no causal process exists. For example the cause of the gender correlation may be more related to measure of alpha behavior than sex. Even if the cause is not understood if there is a strong correlation then usefulness/trustfulness of the information obtained from the analysis can be of greater value.


Reblog this post [with Zemanta]

0 comments:

Post a Comment