Presentation given by Josh Goldsmith at "Interdependence and Innovation – 2nd Cologne Conference on Translation, Interpreting and Technical Documentation" on November 30, 2018
3. DEFINITIONS
TERM
“lexical items belonging to
specialized areas of
usage”
Sager (1990: 2)
TERMINOLOGY
EXTRACTION
“Automatically isolating
terminology from texts”
Cabré, Estopà & Vivaldi
(2001:53)
3
4. WHY TERMINOLOGY MATTERS
FOR INTERPRETERS
To be accepted as “insiders” and perceived as “competent,”
interpreters must:
● Have sufficient specialized knowledge of the domain
● Know and use domain-specific terminology
● Master phraseology of specialized language
Fantinuoli (2012:41)
4
5. WHY EXTRACT
TERMINOLOGY?
● Limited preparation time: materials made available last
minute (Pignataro 2012)
● Preparation is time-intensive; generally entails collecting
parallel texts and extracting relevant terminology
(Fantinuoli 2017)
● Collecting terminology is a regular part of preparing for an
assignment (Bilgen 2009:91)
● Preparation front-loads cognitively challenging tasks and
can decrease cognitive load while interpreting (Stoll 2009)
● Terminological preparation may improve performance and
processing, leading to target language renditions featuring
more specialized terminology (Diaz Galaz, 2015)
5
6. TERMINOLOGY
MANAGEMENT SYSTEMS
● Early studies survey professionals about terminology-
related needs and practices to develop terminology
management tools for interpreters
Rütten (2003), Bilgen (2009)
● Researchers analyze tools to see if meet needs
Costa, Corpas Pastor & Durán
Muñoz (2014, 2017); Will (2015)
● These studies tend to be based on researchers’ subjective
assessments of interpreters’ needs rather than on
objective criteria
Goldsmith (2017)
6
7. COULD TERMINOLOGY EXTRACTION
STREAMLINE PREPARATION?
● Tools could decrease preparation time and allow
interpreters to focus on the most relevant terms during
preparation
Rütten (2003)
● Corpus-based preparation gave rise to better terminology-
related performance in simultaneous interpretation
Xu (2015)
7
8. LACK OF
TECHNOLOGY AND RESEARCH
● “No tool has been specifically developed to satisfy the
needs of interpreters during the preparatory phase”
Fantinuoli (2017:24)
● Research has considered key features of terminology
extraction tools for translators, but not interpreters
Costa, Zaretskaya, Corpas Pastor
& Seghiri (2016)
8
9. TYPES OF AUTOMATIC
TERMINOLOGY EXTRACTION
SYSTEMS
9
LINGUISTIC
Use linguistic
knowledge
(morphology, etc.)
to detect lexical
units
▸ Noise tends to
be high
STATISTIC
Use relative
frequencies to
identify high-
frequency lexical
units
▸ Hard to find
low-frequency
terms
HYBRID
Combine
statistical and
linguistic
measures
Cabré, Estopà &
Vivaldi (2001:53);
Fantinuoli (2012)
10. ASSESSING TERMINOLOGY
EXTRACTION SYSTEMS
RECALL
“Capacity of the detection
system to extract all terms
from a document”
SILENCE
“Terms contained in an
analysed text that are not
detected by the system”
PRECISION
“Capacity to discriminate
between those units
detected by the system
which are terms and those
which are not”
NOISE
“The rate between discarded
candidates and accepted
ones”
Cabré, Estopà & Vivaldi
(2001:53-56)
10
11. AIMS OF AUTOMATIC
TERMINOLOGY EXTRACTION
● Reduce noise (be accurate)
● Reduce silence (be complete)
● Allow for manual selection of terms and validation of
candidate terms
Heid (2001)
● “As usability is regarded as being fundamental for the
acceptability of an interpreter-oriented tool, a terminology
extraction system for interpreters must give priority to
precision over recall.”
Fantinuoli (2012: 49)
11
13. 1. What tools are interpreters using for terminology
extraction?
2. What are the strengths and weaknesses of these tools?
3. In which settings are terminology extraction tools useful?
In which settings should they be avoided?
4. What does the terminology extraction process look like?
5. How does terminology extraction compare to other types of
preparation?
6. In addition to the term itself, what should these tools
extract?
7. What features would an ideal terminology extraction tool
offer?
RESEARCH QUESTIONS
14. EXPLORATORY, MULTI-PHASE MIXED METHODS RESEARCH TO
▸ Map the field of terminology extraction tools for interpreters
▸ Develop an instrument to assess tools (Creswell & Clark 2006)
SEMI-STRUCTURED IN-DEPTH INTERVIEWS
▸ Develop detailed descriptions, present multiple perspectives, describe
process, understand a situation from the inside (Weiss, 1994).
▸ Answers “are rich and thick with qualitative data” (Turner, 2010, p. 756).
▸ Zoom™, Speechmatics™
▸ Informed consent
▸ Anonymous
INDUCTIVE THEMATIC ANALYSIS
▸ Transcribe interviews and inductively derive categories (Kvale, 1996)
▸ Coded with NVivo™ (CAQDAS program)
RESEARCH DESIGN
15. ▸ 10 respondents, all professional interpreters (2 women)
▸ Age 29 – 57 (μ = 42.2)
▸ Domiciled in Europe and North America
▸ 6 members of professional associations (60%)
▸ 2 staff interpreters (20%)
▸ Conference (100%), Media (10%), Court (10%) and
Community (10%) interpreting
▸ Experience: 3 – 30 years (μ = 17.7)
▸ Experience using terminology extraction tools: 1 - 17
years (μ = 8.9)
▸ Translation, training, research, administration,
voiceovers
PARTICIPANTS
17. THIS IS A PILOT STUDY
RESULTS CANNOT BE
GENERALIZED, BUT DO AIM
TO GIVE A GENERAL
OVERVIEW OF TOOLS,
EXPERIENCES AND
EXPECTATIONS.
PERCENTAGES ARE NOT
STATISTICALLY SIGNIFICANT
OR GENERALIZABLE.
20. 20
InterpretBank (60%)
Interpreters’ Help (40%)
SketchEngine (20%; 30% used or tested)
Intragloss (10%; 40% used or tested)
Wordsmith, Terminotix, Readdle Documents, GoodReader, GT4T,
dtSearch, Thermostat, as well as an in-house tool at an
international organization (10% each)
▸ Users work with or had tested multiple types of
terminology extraction software
TERMINOLOGY EXTRACTION
SOFTWARE USED
21. 21
Terminology management tools (InterpretBank, Interpreters’
Help, Interplex, MS Access): 100%
Annotation tools (Readdle Documents, GoodReader, PDF
Exchange Editor, Skim): 50%
Terminology database (e.g. IATE): 50%
Wikipedia: 40%
Linguee: 40%
Search Engines: 30%
OTHER
SOFTWARE USED
23. TYPES OF
TECHNOLOGY-ASSISTED
TERMINOLOGY EXTRACTION
23
MANUAL
User selects terms
manually.
Tool provides
support, e.g., to:
▸ add terms to
glossary
▸ look up
translation
▸ help manage
terms
SEMI-AUTOMATIC
User provides
document(s).
Tool suggests terms.
User reviews and
accepts them.
AUTOMATIC
User provides
document(s).
Tool suggests term
candidates.
Goldsmith (2018)
38. 38
STRENGTHS OF TERMINOLOGY
EXTRACTION TOOLS (1)
FACILITATES PREPARATION
Saves time (100%)
Provides terminology despite time pressure (90%)
Quick extraction from lengthy documents (60%)
Less hassle / menial copying and pasting (30%)
Automatic annotation of term (and translation) (20%)
Better preparation (10%)
CONSISTENCY/RELIABILITY
Accurate/reliable results from automatic extraction (50%)
Consistent preparation (20%)
39. 39
STRENGTHS OF TERMINOLOGY
EXTRACTION TOOLS (2)
TERMINOLOGICAL PRECISION
Automatically extract most important / “right” terms (50%)
Automatically look up translations on other sites (40%)
Automatically extract named entities (10%)
Add stop words (10%)
Search function (10%)
ERGONOMICS
Lightweight, portable, small footprint (30%)
40. 40
STRENGTHS OF TERMINOLOGY
EXTRACTION TOOLS (3)
DISPLAY/INTERFACE
Parallel scrolling (50%)
Easy comparison of bilingual/multilingual texts (30%)
Manual highlighting/annotation/bookmarking (20%)
Easy to use; easy input of terms; visually appealing; filter/edit
results (10% each)
EXPORT/STORAGE
Export candidates to database (40%)
Back up/digitize glossaries (30%)
Export in shareable format (20%)
Reuse for later assignments (10%)
41. 41
WEAKNESSES OF TERMINOLOGY
EXTRACTION TOOLS (1)
PREPARATION
Incomplete preparation if only use term extraction (40%)
Time-intensive (manual, copy/paste) (20%)
Slow with large glossaries (10%)
IMPORT/EXPORT/STORAGE
Poor export/formatting of exported text (20%)
Tool doesn’t recognize format (e.g. line breaks, images) (20%)
Compatibility (Mac/PC, etc.) (10%)
Poor import of documents/glossaries (10%)
Export not provided (10%)
42. 42
WEAKNESSES OF TERMINOLOGY
EXTRACTION TOOLS (2)
EXTRACTION
Multilingual extraction not supported (50%)
Too many terms extracted (50%)
Results need cleaning up (30%)
Too few/many words in term (20%)
Noise (20%)
Too few terms extracted (20%)
Incomplete extraction (e.g. context missing) (10%)
Tool reorders words (10%)
43. 43
WEAKNESSES OF TERMINOLOGY
EXTRACTION TOOLS (3)
DISPLAY
Poor/incomplete presentation of results (30%)
Terminology entry lacks relevant fields (10%)
Small screen size (tablet) (10%)
CUSTOMIZATION
Tools not designed for interpreters (10%)
Software doesn’t know user’s individual needs (10%)
COST
Cost/subscription (20%)
44. 44
SETTINGS WHERE
EXTRACTION TOOLS PREFERRED
80% used extraction when documents available
MANUAL
Parallel texts (40%)
New topic (30%)
Few documents (30%)
Time permitting (30%)
Focus on collocations (10%)
Only monolingual
documents available (10%)
AUTOMATIC
Numerous/long documents (40%)
For institutions (40%)
Time pressure (40%)
For hearings (20%)
For automatic annotation when
glossaries available (20%)
Familiar subject matter (10%)
All assignments (10%)
When onsite (10%)
45. 45
SETTINGS WHERE
EXTRACTION TOOLS AVOIDED
Limited / no materials available (50%)
Documents not available in digital format (30%)
Need to understand content (30%)
Text too general (20%)
Powerpoint (20%)
Faster to read than extract (20%)
Recurring meeting/familiar with terminology (10%)
Confidentiality (10%)
Multilingual documents not available (10%)
Vague subject matter (10%)
Very large / small glossary available (10%)
46. 70%of respondents felt terminology extraction was more effective
than other types of preparation
46
62.5%
BUT ONLY 40%
of respondents preferred terminology extraction over other
types of preparation
of respondents felt terminology extraction tools meet their
needs
47. 90%of respondents felt clients were not aware they used
terminology extraction tools. Those who were aware reacted
positively (20%) and found it professional (10%)
47
80%of respondents felt colleagues were curious about
terminology extraction tools, although some mentioned
uninterested colleagues (40%) who were averse to new
approaches (20%) or unwilling to change their habits (20%)
49. 49
THE IDEAL TOOL
SHOULD EXTRACT
Term (100%)
Single and multi-word terms (100%)
Context/examples (90%)
Equivalents in other languages (70%)
Source / source document (50%)
Definition (40%)
Frequencies (40%)
Subject matter overview (40%)
Collocations / phraseology (30%)
Named entities; figures; domain; link to source (20%)
Graphical information; images; hyponyms; semantic
groupings (10%)
50. 50
THE IDEAL TOOL
ANNOTATION
Allow manual annotation (70%)
Highlight terms (60%)
Highlight phraseology (60%)
Print translations above extracted term (40%)
Automatically annotate term occurrences from glossary (30%)
Manually add sticky notes (30%)
Highlight relevant content (20%)
Annotations overview pane (20%)
Bookmarks; Highlight phraseology; Highlight named entities (10%
each)
51. 51
THE IDEAL TOOL
EXTRACTION/TRANSLATION
Extract unknown terms (80%)
Multilingual extraction available (80%)
Statistical extraction/show frequencies (70%)
Filter results (manually, chronologically, thematically, by frequency, by
agenda item, etc.) (60%)
Extract from multiple files (60%)
Access external resources from within program (60%)
Ignore stop words / decrease noise (60%)
View parallel texts & manually extract equivalents (50%)
Automatically rank most relevant terms (40%)
No clean up necessary; access multiple termbases/dictionaries; search
glossaries for extracted terms; tablet and/or stylus interface (30%) ...
52. 52
THE IDEAL TOOL
IMPORT
Limited preprocessing / automatic conversion regardless of
source file format (40%)
Batch upload (30%)
Import from parallel resources / in multiple languages (20%)
Built-in webcrawler (10%)
Import from your institutional calendar (10%)
Flawless import (no errors with line breaks, etc.) (10%)
Imports pre-existing glossaries (10%)
53. 53
THE IDEAL TOOL
EXPORT
Multilingual export (60%)
One-click import into database (50%)
Export into widely used/compatible formats (30%)
Export annotated text (20%)
Print from tool (10%)
54. 54
THE IDEAL TOOL
FORMAT AND STORAGE
FORMAT
Cross-platform (50%)
Software suite / integration with terminology management tool
(50%)
Compatible with mobile devices (30%)
“Available on my operating system” (30%)
Compatible with translation tools/databases (20%)
Checks pre-existing glossaries to avoid duplicates (20%)
STORAGE
Local storage (40%)
Offline to maintain confidentiality (30%)
Cloud storage (30%)
55. 55
THE IDEAL TOOL
INTERFACE
Link term to context (90%)
View parallel texts side by side with synchronous scrolling (70%)
Bilingual/multilingual term list (50%)
Reliability marker/index (50%)
Simple, uncluttered display (40%)
Search within source documents (30%)
Customize display (30%)
Speech recognition interface (20%)
Can manually annotate with stylus (20%)
Clear color code (20%)/color code for fuzzy matches (20%)
Search within/filter exported terms (20%)
Extensive information available (20%) ...
56. 56
THE IDEAL TOOL
CUSTOMIZATION
Configure number of terms extracted (50%)
Configure working languages (40%)
Customize external resources (40%)
Custom results based on audience/domain/client (40%)
Configure term length (n-gram) (30%)
Customize display/user interface (30%)
Knows interpreter’s preferences (20%); Designed for interpreters (20%)
Tool knows interpreter’s background and adjusts accordingly (20%)
Configure frequency threshold (20%)
Learns from human postprocessing; preconfigure database / fields;
configure domain; tool knows where to find information in document (10%)
58. 58
CONCLUSIONS (1)
Interpreters regularly use manual, semi-automatic and
automatic terminology extraction tools.
The terminology extraction process differs for every
interpreter, although it tends to include document
collection, extraction, glossary building, and possible
annotation.
Interpreters prefer different approaches (manual vs.
[semi-]automatic) in different settings, and avoid
terminology extraction when documents are not available or
digitized or when they need an in-depth understanding of
content and have time to read the entire text.
59. 59
Terminology extraction saves time and can lead to
reliable results and terminological precision.
Terminology extraction alone may be insufficient.
Most respondents felt terminology extraction was more
effective than other types of preparation.
Most respondents felt that terminology extraction tools did
not meet their needs.
CONCLUSIONS (2)
60. 60
Interpreters use a wide variety of terminology extraction
software, but few terminology extraction tools are
designed for interpreters, and the perfect tool
doesn’t exist yet.
Minimally, the ideal tool should extract unknown terms,
context, and translations and offer multilingual
extraction, filtering of results, access to
terminological resources, multilingual export,
manual annotation, parallel scrolling,
bilingual/multilingual term lists and significant
customization.
CONCLUSIONS (3)
61. 61
Phase 2: Survey to rank the features of ideal
tools and make recommendations to
designers
Phase 3: Use weighted rankings to assess
existing tools and make recommendations to
practitioners.
FUTURE WORK