Data collection and analysis

Data were collected in spontaneous lingua franca communication. Participants were 13 adult individuals in two groups with the following first languages: Spanish, Chinese, Polish, Portuguese, Czech, Telugu, Korean and Russian. All subjects had spent a minimum of six months in the U.S. and had at least intermediate knowledge of English before arriving Data collection and analysis. Both Group 1 (7 students) and Group 2 (6 students) participated in a 30-minute discussion about the following topics: housing in the area, jobs, and local customs. The conversations were undirected, and uncoached. Subjects said what they wanted. No native speaker was present. Conversations were recorded and then transcribed, which resulted in a Data collection and analysis 13,726 word database.

After a week participants were given the chance to listen to their conversations and were asked to discuss their thought processes using a “think aloud” technique.

Data analysis focused on the types of formulaic units. The questions to answer can be summarized as follows:

How does the Data collection and analysis use of formulas relate to the ad hoc generated expressions in the data?

What type of fixed expressions did the subjects prefer?

What formulas did speakers create on their own?


The database consists of 13,726 words. They represent the six types of formulaic units focused on in the Data collection and analysis database. Words were counted in each type of formulaic chunk in the transcripts. Following are samples for each unit:

Grammatical units: I am going to stay here; you have to do that.

Fixed semantic units: after a while, for the time being, once a month, for a long time.

Phrasal verbs: They Data collection and analysis were worried about me; Take care of the kids; I am trying to remember.

Speech formulas: not bad; that’s why; you know; I mean.

Situation-bound utterances: How are you?; How about you?; That’s fine.

Idioms: gives me a ride; that makes sense; figure out what I want.

What is Data collection and analysis striking is the relatively low occurrence of formulaic expressions in the database. It is only 7.6 percent of the total words. Even if we know that this low percentage refers only to one particular database, and the results may change significantly if our фокус is on other databases it is still Data collection and analysis much less than linguists speak about when they address the issue of formulaicity in native speaker conversation.

We can still say that native speakers use fixed expressions to a great extent. Formulas are natural consequences of everyday language use, and language users feel comfortable using them because fixed expressions Data collection and analysis usually keep them out of trouble since they mean similar things to members of a particular speech community.

Even if our database is very limited and does not let us make generalizations about lingua franca communication, one thing seems to be obvious. As far as formulaic language Data collection and analysis use is concerned there seems to be a significant difference between native speaker communication and lingua franca communication. Non-native speakers appear to rely on prefabricated expressions in their lingua franca language production to a much smaller extent than native speakers. The question is why this is so. But before Data collection and analysis making an attempt to give an answer to the question we should look at the distribution of formula types in the database.

Most frequent occurrences are registered in three groups: fixed semantic units, phrasal verbs and speech formulas. However, we have to be careful with speech formulas that constitute a unique Data collection and analysis group because if we examine the different types of expressions within the group we can see that three expressions (you know; I / you mean; you’re right) account for 66.8 percent (167 out of 250) out of all words counted in this group. The kind of frequency that we Data collection and analysis see in the use of these three expressions is not comparable to any other expressions in the database. This seems to make sense because these particular speech formulas may fulfill different functions such as back-channeling, filling a gap, and the like. They are also used very frequently by native speakers so Data collection and analysis it is easy for non-native speakers to pick them up.

If we disregard speech formulas for the reason explained above, formulas that occur in higher frequency than any other expressions are fixed semantic units and phrasal verbs. We did not have a native speaker control group but Data collection and analysis we can speculate that this might not be so in native speaker communication. It can be hypothesized that native speakers use the groups of formulas in a relatively balanced way, or at least in their speech production fixed semantic units and phrasal verbs do not show priority to the Data collection and analysis extent shown in lingua franca communication.

How can this preference of fixed semantic units and phrasal verbs by non-native speakers be explained? How does this issue relate to the first observation about the amount of formulas in native speaker communication and lingua franca communication?

ELF speakers usually avoid the Data collection and analysis use of formulaic expressions not necessarily because they do not know these phrases but because they are worried that their interlocutors will not understand them properly. They are reluctant to use language that they know, or perceive to be figurative or semantically less transparent. ELF speakers try to Data collection and analysis come as close to the compositional meaning of expressions as possible because they think that if there is no figurative and/or metaphorical meaning involved their interlocutors will process the English words and expressions the way they meant them. Since lingua franca speakers come from different socio Data collection and analysis-cultural backgrounds and represent different cultures the mutual knowledge they may share is the knowledge of the linguistic code. Consequently, semantic analyzability plays a decisive role in ELF speech production. This assumption is supported by the fact that the most frequently used formulaic expressions are the fixed semantic units and Data collection and analysis phrasal verbs in which there is semantic transparency to a much greater extent than in idioms, situation-bound utterances or speech formulas. Of course, one can argue that phrasal verbs may frequently express figurative meaning and function like idioms such as I never hang out…; they will kick me Data collection and analysis out from my home... However when I found cases like this in the database, I listed the phrasal verb among the category “idioms” rather than “phrasal verbs”. So the group of phrasal verbs above contains expressions in which there is usually clear semantic transparency.

Another example of this interesting phenomenon Data collection and analysis in the database is the endeavor of speakers creating their own formulas that can be split into two categories. In the first category we can find expressions that are used only once and demonstrate an effort to sound metaphorical. However, this endeavor is usually driven by the L Data collection and analysis1 in which there may be an equivalent expression for the given idea. For instance: it is almost skips from my thoughts; you are not very rich in communication; take a school.

The other category comprises expressions that are created on the spot during the conversations and are picked up by the Data collection and analysis members of the ad hoc speech community. One of the participants creates or coins an expression that is needed in the discussion of a given topic. This unit functions like a target formula the use of which is accepted by the participants in the given conversation, and Data collection and analysis is demonstrated by the fact that other participants also pick it up and use it. However, this is just a temporary formula that may be entirely forgotten when the conversation is over. For instance: we connect each other very often; native American.

Lingua franca speakers frequently coin or create Data collection and analysis their own ways of expressing themselves effectively, and the mistakes they may make will carry on in their speech even though the correct form is there for them to imitate. For instance, several participants adopted the phrase native Americans to refer to native speakers of English. They even joked about it Data collection and analysis and said that the use of target formulas coined by them in their temporary speech community was considered like a “joint venture” and created a special feeling of camaraderie in the group.

The avoidance of genuine formulaic language and preference for semantically transparent expressions can be Data collection and analysis explained by another factor. The analysis of the database and the “think aloud” sessions shed light on something that is hardly discussed in the literature. It seems that multiword chunks might not help L2 processing in the same way they help L1 processing.

Lingua franca speakers usually do not Data collection and analysis know how flexible the formulas are linguistically, i.e., what structural changes they allow without losing their original function and/or meaning. Linguistic form is a semantic scaffold; if it is defective, the meaning will inevitably fall apart. This is what lingua franca speakers worry about as was revealed in the “think Data collection and analysis aloud” sessions. The “unnaturalness” of their language production from a native speaker perspective is caused more by imperfect phraseology than by inadequate conceptual awareness. These imperfections differ from the kind of alteration and elaboration of conventional phrases that native speakers produce, because there is flawlessness to nativespeaker variation Data collection and analysis that ELF speakers usually fail to imitate. If native speakers do alter conventional expressions, they make any necessary changes to the grammar and syntax as a matter of course. This way they ensure that the expression flows uninterruptedly from word to word and expression to expression, and this Data collection and analysis really helps processing. However, this does not appear to work the same way for lingua franca speakers who may not be able to continue the expression if they break down somewhere in the middle of its use.

We can say that formulaic language use in ELF communication points to the fact Data collection and analysis that with no native speakers participating in the language game the lingua franca interlocutors still make an effort in their own way to keep the original rules of the game. This means that they try to use formulas that appear to be the best means to express Data collection and analysis their immediate communicative goals. The fixed expressions they use most frequently are the ones that have clear compositional meaning which makes their interpretation easy. As the examples demonstrate, lingua franca communicators may also create new formulas if the need arises.