SlideShare a Scribd company logo
1 of 62
TERMINOLOGY
EXTRACTION
TOOLS FOR
INTERPRETERS
JOSH GOLDSMITH
2ND COLOGNE CONFERENCE ON
TRANSLATION, INTERPRETING,
AND TECHNICAL
DOCUMENTATION
NOVEMBER 30, 2018
JG@JOSHGOLDSMITH.COM
@GOLDSMITH_JOSH
http://xl8.link/
TerminologyExtractionSlides
1.
STATE OF THE
ART
2
DEFINITIONS
TERM
“lexical items belonging to
specialized areas of
usage”
Sager (1990: 2)
TERMINOLOGY
EXTRACTION
“Automatically isolating
terminology from texts”
Cabré, Estopà & Vivaldi
(2001:53)
3
WHY TERMINOLOGY MATTERS
FOR INTERPRETERS
To be accepted as “insiders” and perceived as “competent,”
interpreters must:
● Have sufficient specialized knowledge of the domain
● Know and use domain-specific terminology
● Master phraseology of specialized language
Fantinuoli (2012:41)
4
WHY EXTRACT
TERMINOLOGY?
● Limited preparation time: materials made available last
minute (Pignataro 2012)
● Preparation is time-intensive; generally entails collecting
parallel texts and extracting relevant terminology
(Fantinuoli 2017)
● Collecting terminology is a regular part of preparing for an
assignment (Bilgen 2009:91)
● Preparation front-loads cognitively challenging tasks and
can decrease cognitive load while interpreting (Stoll 2009)
● Terminological preparation may improve performance and
processing, leading to target language renditions featuring
more specialized terminology (Diaz Galaz, 2015)
5
TERMINOLOGY
MANAGEMENT SYSTEMS
● Early studies survey professionals about terminology-
related needs and practices to develop terminology
management tools for interpreters
Rütten (2003), Bilgen (2009)
● Researchers analyze tools to see if meet needs
Costa, Corpas Pastor & Durán
Muñoz (2014, 2017); Will (2015)
● These studies tend to be based on researchers’ subjective
assessments of interpreters’ needs rather than on
objective criteria
Goldsmith (2017)
6
COULD TERMINOLOGY EXTRACTION
STREAMLINE PREPARATION?
● Tools could decrease preparation time and allow
interpreters to focus on the most relevant terms during
preparation
Rütten (2003)
● Corpus-based preparation gave rise to better terminology-
related performance in simultaneous interpretation
Xu (2015)
7
LACK OF
TECHNOLOGY AND RESEARCH
● “No tool has been specifically developed to satisfy the
needs of interpreters during the preparatory phase”
Fantinuoli (2017:24)
● Research has considered key features of terminology
extraction tools for translators, but not interpreters
Costa, Zaretskaya, Corpas Pastor
& Seghiri (2016)
8
TYPES OF AUTOMATIC
TERMINOLOGY EXTRACTION
SYSTEMS
9
LINGUISTIC
Use linguistic
knowledge
(morphology, etc.)
to detect lexical
units
▸ Noise tends to
be high
STATISTIC
Use relative
frequencies to
identify high-
frequency lexical
units
▸ Hard to find
low-frequency
terms
HYBRID
Combine
statistical and
linguistic
measures
Cabré, Estopà &
Vivaldi (2001:53);
Fantinuoli (2012)
ASSESSING TERMINOLOGY
EXTRACTION SYSTEMS
RECALL
“Capacity of the detection
system to extract all terms
from a document”
SILENCE
“Terms contained in an
analysed text that are not
detected by the system”
PRECISION
“Capacity to discriminate
between those units
detected by the system
which are terms and those
which are not”
NOISE
“The rate between discarded
candidates and accepted
ones”
Cabré, Estopà & Vivaldi
(2001:53-56)
10
AIMS OF AUTOMATIC
TERMINOLOGY EXTRACTION
● Reduce noise (be accurate)
● Reduce silence (be complete)
● Allow for manual selection of terms and validation of
candidate terms
Heid (2001)
● “As usability is regarded as being fundamental for the
acceptability of an interpreter-oriented tool, a terminology
extraction system for interpreters must give priority to
precision over recall.”
Fantinuoli (2012: 49)
11
2.
STUDY DESIGN
AND
PARTICIPANTS
12
1. What tools are interpreters using for terminology
extraction?
2. What are the strengths and weaknesses of these tools?
3. In which settings are terminology extraction tools useful?
In which settings should they be avoided?
4. What does the terminology extraction process look like?
5. How does terminology extraction compare to other types of
preparation?
6. In addition to the term itself, what should these tools
extract?
7. What features would an ideal terminology extraction tool
offer?
RESEARCH QUESTIONS
EXPLORATORY, MULTI-PHASE MIXED METHODS RESEARCH TO
▸ Map the field of terminology extraction tools for interpreters
▸ Develop an instrument to assess tools (Creswell & Clark 2006)
SEMI-STRUCTURED IN-DEPTH INTERVIEWS
▸ Develop detailed descriptions, present multiple perspectives, describe
process, understand a situation from the inside (Weiss, 1994).
▸ Answers “are rich and thick with qualitative data” (Turner, 2010, p. 756).
▸ Zoom™, Speechmatics™
▸ Informed consent
▸ Anonymous
INDUCTIVE THEMATIC ANALYSIS
▸ Transcribe interviews and inductively derive categories (Kvale, 1996)
▸ Coded with NVivo™ (CAQDAS program)
RESEARCH DESIGN
▸ 10 respondents, all professional interpreters (2 women)
▸ Age 29 – 57 (μ = 42.2)
▸ Domiciled in Europe and North America
▸ 6 members of professional associations (60%)
▸ 2 staff interpreters (20%)
▸ Conference (100%), Media (10%), Court (10%) and
Community (10%) interpreting
▸ Experience: 3 – 30 years (μ = 17.7)
▸ Experience using terminology extraction tools: 1 - 17
years (μ = 8.9)
▸ Translation, training, research, administration,
voiceovers
PARTICIPANTS
PARTICIPANTS’ EXPERIENCE
MANUAL
SEMI-
AUTOMATIC
AUTOMATIC
PERCENTAGE OF
ASSIGNMENTS USED
0 - 100%
(μ = 48.0%)
0 - 100%
(μ = 18.9%)
0 - 100%
(μ =
40.0%)
NUMBER OF
ASSIGNMENTS USED
0 - 840
(μ = 123.8)
0 - 150
(μ = 17.2)
0 - 600
(μ = 135.6)
THIS IS A PILOT STUDY
RESULTS CANNOT BE
GENERALIZED, BUT DO AIM
TO GIVE A GENERAL
OVERVIEW OF TOOLS,
EXPERIENCES AND
EXPECTATIONS.
PERCENTAGES ARE NOT
STATISTICALLY SIGNIFICANT
OR GENERALIZABLE.
3.
TOOLS USED
18
19
HARDWARE USED
Desktop (50%)
Laptop (75%)
Tablet (20%)
Windows operating system (80%)
MacOS (40%)
iOS (20%)
▸ Some users utilize multiple devices
20
InterpretBank (60%)
Interpreters’ Help (40%)
SketchEngine (20%; 30% used or tested)
Intragloss (10%; 40% used or tested)
Wordsmith, Terminotix, Readdle Documents, GoodReader, GT4T,
dtSearch, Thermostat, as well as an in-house tool at an
international organization (10% each)
▸ Users work with or had tested multiple types of
terminology extraction software
TERMINOLOGY EXTRACTION
SOFTWARE USED
21
Terminology management tools (InterpretBank, Interpreters’
Help, Interplex, MS Access): 100%
Annotation tools (Readdle Documents, GoodReader, PDF
Exchange Editor, Skim): 50%
Terminology database (e.g. IATE): 50%
Wikipedia: 40%
Linguee: 40%
Search Engines: 30%
OTHER
SOFTWARE USED
4.
THE
EXTRACTION
PROCESS
DIFFERENT
APPROACHES TO
MANUAL, SEMI-
AUTOMATIC AND
AUTOMATIC
TERMINOLOGY
EXTRACTION
22
TYPES OF
TECHNOLOGY-ASSISTED
TERMINOLOGY EXTRACTION
23
MANUAL
User selects terms
manually.
Tool provides
support, e.g., to:
▸ add terms to
glossary
▸ look up
translation
▸ help manage
terms
SEMI-AUTOMATIC
User provides
document(s).
Tool suggests terms.
User reviews and
accepts them.
AUTOMATIC
User provides
document(s).
Tool suggests term
candidates.
Goldsmith (2018)
MONOLINGUAL
MANUAL
TERMINOLOGY EXTRACTION
24
WITH
ANNOTATION
BILINGUAL
MANUAL
TERMINOLOGY EXTRACTION
25
WITH
PARALLEL
DOCUMENTS
MONOLINGUAL/BILINGUAL
MANUAL
TERMINOLOGY EXTRACTION
26
WITH
PARALLEL
DOCUMENTS
MULTILINGUAL
MANUAL
TERMINOLOGY EXTRACTION
27
MONOLINGUAL
SEMI-AUTOMATIC
TERMINOLOGY EXTRACTION
28
MONOLINGUAL
SEMI-AUTOMATIC
TERMINOLOGY EXTRACTION
29
BILINGUAL
SEMI-AUTOMATIC
TERMINOLOGY EXTRACTION
30
BILINGUAL
SEMI-AUTOMATIC
TERMINOLOGY EXTRACTION
31
MULTILINGUAL
SEMI-AUTOMATIC
TERMINOLOGY EXTRACTION
32
WITH
ANNOTATION
MONOLINGUAL/MULTILINGUAL
AUTOMATIC
TERMINOLOGY EXTRACTION
33
BILINGUAL
AUTOMATIC
TERMINOLOGY EXTRACTION
34
WITH
ANNOTATION
5.
OTHER
PREPARATION
STRATEGIES
35
36
OTHER
PREPARATION STRATEGIES
Read documents (90%)
Background reading (50%)
Web research (50%)
Memorize/drill terms (50%)
Manual annotation (40%)
Wikipedia (40%)
Terminological research (30%)
Gisting/text summarization (20%)
Automatic translation; Concordancer; Build glossaries
collaboratively; Read news; Read technical documents; Practice
interpreting on similar topics (10%)
6.
PROS, CONS
AND
EFFECTIVENESS
37
38
STRENGTHS OF TERMINOLOGY
EXTRACTION TOOLS (1)
FACILITATES PREPARATION
Saves time (100%)
Provides terminology despite time pressure (90%)
Quick extraction from lengthy documents (60%)
Less hassle / menial copying and pasting (30%)
Automatic annotation of term (and translation) (20%)
Better preparation (10%)
CONSISTENCY/RELIABILITY
Accurate/reliable results from automatic extraction (50%)
Consistent preparation (20%)
39
STRENGTHS OF TERMINOLOGY
EXTRACTION TOOLS (2)
TERMINOLOGICAL PRECISION
Automatically extract most important / “right” terms (50%)
Automatically look up translations on other sites (40%)
Automatically extract named entities (10%)
Add stop words (10%)
Search function (10%)
ERGONOMICS
Lightweight, portable, small footprint (30%)
40
STRENGTHS OF TERMINOLOGY
EXTRACTION TOOLS (3)
DISPLAY/INTERFACE
Parallel scrolling (50%)
Easy comparison of bilingual/multilingual texts (30%)
Manual highlighting/annotation/bookmarking (20%)
Easy to use; easy input of terms; visually appealing; filter/edit
results (10% each)
EXPORT/STORAGE
Export candidates to database (40%)
Back up/digitize glossaries (30%)
Export in shareable format (20%)
Reuse for later assignments (10%)
41
WEAKNESSES OF TERMINOLOGY
EXTRACTION TOOLS (1)
PREPARATION
Incomplete preparation if only use term extraction (40%)
Time-intensive (manual, copy/paste) (20%)
Slow with large glossaries (10%)
IMPORT/EXPORT/STORAGE
Poor export/formatting of exported text (20%)
Tool doesn’t recognize format (e.g. line breaks, images) (20%)
Compatibility (Mac/PC, etc.) (10%)
Poor import of documents/glossaries (10%)
Export not provided (10%)
42
WEAKNESSES OF TERMINOLOGY
EXTRACTION TOOLS (2)
EXTRACTION
Multilingual extraction not supported (50%)
Too many terms extracted (50%)
Results need cleaning up (30%)
Too few/many words in term (20%)
Noise (20%)
Too few terms extracted (20%)
Incomplete extraction (e.g. context missing) (10%)
Tool reorders words (10%)
43
WEAKNESSES OF TERMINOLOGY
EXTRACTION TOOLS (3)
DISPLAY
Poor/incomplete presentation of results (30%)
Terminology entry lacks relevant fields (10%)
Small screen size (tablet) (10%)
CUSTOMIZATION
Tools not designed for interpreters (10%)
Software doesn’t know user’s individual needs (10%)
COST
Cost/subscription (20%)
44
SETTINGS WHERE
EXTRACTION TOOLS PREFERRED
80% used extraction when documents available
MANUAL
Parallel texts (40%)
New topic (30%)
Few documents (30%)
Time permitting (30%)
Focus on collocations (10%)
Only monolingual
documents available (10%)
AUTOMATIC
Numerous/long documents (40%)
For institutions (40%)
Time pressure (40%)
For hearings (20%)
For automatic annotation when
glossaries available (20%)
Familiar subject matter (10%)
All assignments (10%)
When onsite (10%)
45
SETTINGS WHERE
EXTRACTION TOOLS AVOIDED
Limited / no materials available (50%)
Documents not available in digital format (30%)
Need to understand content (30%)
Text too general (20%)
Powerpoint (20%)
Faster to read than extract (20%)
Recurring meeting/familiar with terminology (10%)
Confidentiality (10%)
Multilingual documents not available (10%)
Vague subject matter (10%)
Very large / small glossary available (10%)
70%of respondents felt terminology extraction was more effective
than other types of preparation
46
62.5%
BUT ONLY 40%
of respondents preferred terminology extraction over other
types of preparation
of respondents felt terminology extraction tools meet their
needs
90%of respondents felt clients were not aware they used
terminology extraction tools. Those who were aware reacted
positively (20%) and found it professional (10%)
47
80%of respondents felt colleagues were curious about
terminology extraction tools, although some mentioned
uninterested colleagues (40%) who were averse to new
approaches (20%) or unwilling to change their habits (20%)
7.
THE IDEAL
TOOL
48
49
THE IDEAL TOOL
SHOULD EXTRACT
Term (100%)
Single and multi-word terms (100%)
Context/examples (90%)
Equivalents in other languages (70%)
Source / source document (50%)
Definition (40%)
Frequencies (40%)
Subject matter overview (40%)
Collocations / phraseology (30%)
Named entities; figures; domain; link to source (20%)
Graphical information; images; hyponyms; semantic
groupings (10%)
50
THE IDEAL TOOL
ANNOTATION
Allow manual annotation (70%)
Highlight terms (60%)
Highlight phraseology (60%)
Print translations above extracted term (40%)
Automatically annotate term occurrences from glossary (30%)
Manually add sticky notes (30%)
Highlight relevant content (20%)
Annotations overview pane (20%)
Bookmarks; Highlight phraseology; Highlight named entities (10%
each)
51
THE IDEAL TOOL
EXTRACTION/TRANSLATION
Extract unknown terms (80%)
Multilingual extraction available (80%)
Statistical extraction/show frequencies (70%)
Filter results (manually, chronologically, thematically, by frequency, by
agenda item, etc.) (60%)
Extract from multiple files (60%)
Access external resources from within program (60%)
Ignore stop words / decrease noise (60%)
View parallel texts & manually extract equivalents (50%)
Automatically rank most relevant terms (40%)
No clean up necessary; access multiple termbases/dictionaries; search
glossaries for extracted terms; tablet and/or stylus interface (30%) ...
52
THE IDEAL TOOL
IMPORT
Limited preprocessing / automatic conversion regardless of
source file format (40%)
Batch upload (30%)
Import from parallel resources / in multiple languages (20%)
Built-in webcrawler (10%)
Import from your institutional calendar (10%)
Flawless import (no errors with line breaks, etc.) (10%)
Imports pre-existing glossaries (10%)
53
THE IDEAL TOOL
EXPORT
Multilingual export (60%)
One-click import into database (50%)
Export into widely used/compatible formats (30%)
Export annotated text (20%)
Print from tool (10%)
54
THE IDEAL TOOL
FORMAT AND STORAGE
FORMAT
Cross-platform (50%)
Software suite / integration with terminology management tool
(50%)
Compatible with mobile devices (30%)
“Available on my operating system” (30%)
Compatible with translation tools/databases (20%)
Checks pre-existing glossaries to avoid duplicates (20%)
STORAGE
Local storage (40%)
Offline to maintain confidentiality (30%)
Cloud storage (30%)
55
THE IDEAL TOOL
INTERFACE
Link term to context (90%)
View parallel texts side by side with synchronous scrolling (70%)
Bilingual/multilingual term list (50%)
Reliability marker/index (50%)
Simple, uncluttered display (40%)
Search within source documents (30%)
Customize display (30%)
Speech recognition interface (20%)
Can manually annotate with stylus (20%)
Clear color code (20%)/color code for fuzzy matches (20%)
Search within/filter exported terms (20%)
Extensive information available (20%) ...
56
THE IDEAL TOOL
CUSTOMIZATION
Configure number of terms extracted (50%)
Configure working languages (40%)
Customize external resources (40%)
Custom results based on audience/domain/client (40%)
Configure term length (n-gram) (30%)
Customize display/user interface (30%)
Knows interpreter’s preferences (20%); Designed for interpreters (20%)
Tool knows interpreter’s background and adjusts accordingly (20%)
Configure frequency threshold (20%)
Learns from human postprocessing; preconfigure database / fields;
configure domain; tool knows where to find information in document (10%)
8.
CONCLUSIONS
AND FUTURE
RESEARCH
57
58
CONCLUSIONS (1)
Interpreters regularly use manual, semi-automatic and
automatic terminology extraction tools.
The terminology extraction process differs for every
interpreter, although it tends to include document
collection, extraction, glossary building, and possible
annotation.
Interpreters prefer different approaches (manual vs.
[semi-]automatic) in different settings, and avoid
terminology extraction when documents are not available or
digitized or when they need an in-depth understanding of
content and have time to read the entire text.
59
Terminology extraction saves time and can lead to
reliable results and terminological precision.
Terminology extraction alone may be insufficient.
Most respondents felt terminology extraction was more
effective than other types of preparation.
Most respondents felt that terminology extraction tools did
not meet their needs.
CONCLUSIONS (2)
60
Interpreters use a wide variety of terminology extraction
software, but few terminology extraction tools are
designed for interpreters, and the perfect tool
doesn’t exist yet.
Minimally, the ideal tool should extract unknown terms,
context, and translations and offer multilingual
extraction, filtering of results, access to
terminological resources, multilingual export,
manual annotation, parallel scrolling,
bilingual/multilingual term lists and significant
customization.
CONCLUSIONS (3)
61
Phase 2: Survey to rank the features of ideal
tools and make recommendations to
designers
Phase 3: Use weighted rankings to assess
existing tools and make recommendations to
practitioners.
FUTURE WORK
THANK YOU!
jg@joshgoldsmith.com
@Goldsmith_Josh
http://xl8.link/ TerminologyExtractionSlides
62

More Related Content

What's hot

Cohesion and coherence
Cohesion and coherenceCohesion and coherence
Cohesion and coherencePhuoc Trinh
 
Forensic linguistics
Forensic linguisticsForensic linguistics
Forensic linguisticsAbbou Zohra
 
Research proposal for translation
Research proposal for translationResearch proposal for translation
Research proposal for translationAnam Maha
 
English 9 - Critical Listening
English 9 - Critical ListeningEnglish 9 - Critical Listening
English 9 - Critical ListeningJuan Miguel Palero
 
Ps vocal variety
Ps vocal varietyPs vocal variety
Ps vocal varietyhmfowler
 
Constatives & performatives
Constatives & performativesConstatives & performatives
Constatives & performativesAli Furqan Syed
 
Intro. to Linguistics_2&3 Linguistics and Language
Intro. to Linguistics_2&3 Linguistics and LanguageIntro. to Linguistics_2&3 Linguistics and Language
Intro. to Linguistics_2&3 Linguistics and LanguageEdi Brata
 
Trasnlation shift
Trasnlation shiftTrasnlation shift
Trasnlation shiftBuhsra
 
Unit 3 translation methods
Unit 3 translation methods Unit 3 translation methods
Unit 3 translation methods ssuser2ff7292
 
Intro to Public Speaking.pdf
Intro to Public Speaking.pdfIntro to Public Speaking.pdf
Intro to Public Speaking.pdfDavid80633
 
Epistemic vs deontic
Epistemic vs deonticEpistemic vs deontic
Epistemic vs deonticzouhirgabsi
 
Cognition in discourse
Cognition in discourseCognition in discourse
Cognition in discourseGregorio ypil
 
04 presupposition and entailment
04 presupposition and entailment04 presupposition and entailment
04 presupposition and entailmentgadis pratiwi
 
selecting topic and purpose for public speaking
selecting topic and purpose for public speakingselecting topic and purpose for public speaking
selecting topic and purpose for public speakingSyaie Syaieda
 

What's hot (20)

Cohesion and coherence
Cohesion and coherenceCohesion and coherence
Cohesion and coherence
 
Forensic linguistics
Forensic linguisticsForensic linguistics
Forensic linguistics
 
Research proposal for translation
Research proposal for translationResearch proposal for translation
Research proposal for translation
 
Semasiology
SemasiologySemasiology
Semasiology
 
Technical translation
Technical translation Technical translation
Technical translation
 
English 9 - Critical Listening
English 9 - Critical ListeningEnglish 9 - Critical Listening
English 9 - Critical Listening
 
Introduction to BP debate
Introduction to BP debateIntroduction to BP debate
Introduction to BP debate
 
Ps vocal variety
Ps vocal varietyPs vocal variety
Ps vocal variety
 
Constatives & performatives
Constatives & performativesConstatives & performatives
Constatives & performatives
 
Intro. to Linguistics_2&3 Linguistics and Language
Intro. to Linguistics_2&3 Linguistics and LanguageIntro. to Linguistics_2&3 Linguistics and Language
Intro. to Linguistics_2&3 Linguistics and Language
 
Tools of translation
Tools of translationTools of translation
Tools of translation
 
Trasnlation shift
Trasnlation shiftTrasnlation shift
Trasnlation shift
 
Deixis
DeixisDeixis
Deixis
 
Speech acts
Speech actsSpeech acts
Speech acts
 
Unit 3 translation methods
Unit 3 translation methods Unit 3 translation methods
Unit 3 translation methods
 
Intro to Public Speaking.pdf
Intro to Public Speaking.pdfIntro to Public Speaking.pdf
Intro to Public Speaking.pdf
 
Epistemic vs deontic
Epistemic vs deonticEpistemic vs deontic
Epistemic vs deontic
 
Cognition in discourse
Cognition in discourseCognition in discourse
Cognition in discourse
 
04 presupposition and entailment
04 presupposition and entailment04 presupposition and entailment
04 presupposition and entailment
 
selecting topic and purpose for public speaking
selecting topic and purpose for public speakingselecting topic and purpose for public speaking
selecting topic and purpose for public speaking
 

Similar to Terminology Extraction Tools for Interpreters

Applications of ontologies and problem-solving methods.pdf
Applications of ontologies and problem-solving methods.pdfApplications of ontologies and problem-solving methods.pdf
Applications of ontologies and problem-solving methods.pdfLisa Muthukumar
 
Franz et al 2015 escjam 2015 logic resolution taxonomic variable
Franz et al 2015 escjam 2015 logic resolution taxonomic variableFranz et al 2015 escjam 2015 logic resolution taxonomic variable
Franz et al 2015 escjam 2015 logic resolution taxonomic variabletaxonbytes
 
How do you know what I mean?: Psycholinguistics of spoken language communicat...
How do you know what I mean?: Psycholinguistics of spoken language communicat...How do you know what I mean?: Psycholinguistics of spoken language communicat...
How do you know what I mean?: Psycholinguistics of spoken language communicat...ChigusaKurumada
 
A statistical approach to term extraction.pdf
A statistical approach to term extraction.pdfA statistical approach to term extraction.pdf
A statistical approach to term extraction.pdfJasmine Dixon
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology miningEstelle Delpech
 
Temporal Semantic Techniques for Text Analysis and Applications
Temporal Semantic Techniques for Text Analysis and ApplicationsTemporal Semantic Techniques for Text Analysis and Applications
Temporal Semantic Techniques for Text Analysis and ApplicationsFedelucio Narducci
 
Applying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesApplying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesIván Ruiz-Rube
 
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mappingsamhati27
 
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxSyedNadeemAbbas6
 
Exploring the frontiers of Agile Development in the Digital Era
 Exploring the frontiers of Agile Development in the Digital Era Exploring the frontiers of Agile Development in the Digital Era
Exploring the frontiers of Agile Development in the Digital EraClaudia Melo
 

Similar to Terminology Extraction Tools for Interpreters (11)

Applications of ontologies and problem-solving methods.pdf
Applications of ontologies and problem-solving methods.pdfApplications of ontologies and problem-solving methods.pdf
Applications of ontologies and problem-solving methods.pdf
 
Franz et al 2015 escjam 2015 logic resolution taxonomic variable
Franz et al 2015 escjam 2015 logic resolution taxonomic variableFranz et al 2015 escjam 2015 logic resolution taxonomic variable
Franz et al 2015 escjam 2015 logic resolution taxonomic variable
 
How do you know what I mean?: Psycholinguistics of spoken language communicat...
How do you know what I mean?: Psycholinguistics of spoken language communicat...How do you know what I mean?: Psycholinguistics of spoken language communicat...
How do you know what I mean?: Psycholinguistics of spoken language communicat...
 
A statistical approach to term extraction.pdf
A statistical approach to term extraction.pdfA statistical approach to term extraction.pdf
A statistical approach to term extraction.pdf
 
Asr
AsrAsr
Asr
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology mining
 
Temporal Semantic Techniques for Text Analysis and Applications
Temporal Semantic Techniques for Text Analysis and ApplicationsTemporal Semantic Techniques for Text Analysis and Applications
Temporal Semantic Techniques for Text Analysis and Applications
 
Applying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesApplying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languages
 
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mapping
 
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
 
Exploring the frontiers of Agile Development in the Digital Era
 Exploring the frontiers of Agile Development in the Digital Era Exploring the frontiers of Agile Development in the Digital Era
Exploring the frontiers of Agile Development in the Digital Era
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Terminology Extraction Tools for Interpreters

  • 1. TERMINOLOGY EXTRACTION TOOLS FOR INTERPRETERS JOSH GOLDSMITH 2ND COLOGNE CONFERENCE ON TRANSLATION, INTERPRETING, AND TECHNICAL DOCUMENTATION NOVEMBER 30, 2018 JG@JOSHGOLDSMITH.COM @GOLDSMITH_JOSH http://xl8.link/ TerminologyExtractionSlides
  • 3. DEFINITIONS TERM “lexical items belonging to specialized areas of usage” Sager (1990: 2) TERMINOLOGY EXTRACTION “Automatically isolating terminology from texts” Cabré, Estopà & Vivaldi (2001:53) 3
  • 4. WHY TERMINOLOGY MATTERS FOR INTERPRETERS To be accepted as “insiders” and perceived as “competent,” interpreters must: ● Have sufficient specialized knowledge of the domain ● Know and use domain-specific terminology ● Master phraseology of specialized language Fantinuoli (2012:41) 4
  • 5. WHY EXTRACT TERMINOLOGY? ● Limited preparation time: materials made available last minute (Pignataro 2012) ● Preparation is time-intensive; generally entails collecting parallel texts and extracting relevant terminology (Fantinuoli 2017) ● Collecting terminology is a regular part of preparing for an assignment (Bilgen 2009:91) ● Preparation front-loads cognitively challenging tasks and can decrease cognitive load while interpreting (Stoll 2009) ● Terminological preparation may improve performance and processing, leading to target language renditions featuring more specialized terminology (Diaz Galaz, 2015) 5
  • 6. TERMINOLOGY MANAGEMENT SYSTEMS ● Early studies survey professionals about terminology- related needs and practices to develop terminology management tools for interpreters Rütten (2003), Bilgen (2009) ● Researchers analyze tools to see if meet needs Costa, Corpas Pastor & Durán Muñoz (2014, 2017); Will (2015) ● These studies tend to be based on researchers’ subjective assessments of interpreters’ needs rather than on objective criteria Goldsmith (2017) 6
  • 7. COULD TERMINOLOGY EXTRACTION STREAMLINE PREPARATION? ● Tools could decrease preparation time and allow interpreters to focus on the most relevant terms during preparation Rütten (2003) ● Corpus-based preparation gave rise to better terminology- related performance in simultaneous interpretation Xu (2015) 7
  • 8. LACK OF TECHNOLOGY AND RESEARCH ● “No tool has been specifically developed to satisfy the needs of interpreters during the preparatory phase” Fantinuoli (2017:24) ● Research has considered key features of terminology extraction tools for translators, but not interpreters Costa, Zaretskaya, Corpas Pastor & Seghiri (2016) 8
  • 9. TYPES OF AUTOMATIC TERMINOLOGY EXTRACTION SYSTEMS 9 LINGUISTIC Use linguistic knowledge (morphology, etc.) to detect lexical units ▸ Noise tends to be high STATISTIC Use relative frequencies to identify high- frequency lexical units ▸ Hard to find low-frequency terms HYBRID Combine statistical and linguistic measures Cabré, Estopà & Vivaldi (2001:53); Fantinuoli (2012)
  • 10. ASSESSING TERMINOLOGY EXTRACTION SYSTEMS RECALL “Capacity of the detection system to extract all terms from a document” SILENCE “Terms contained in an analysed text that are not detected by the system” PRECISION “Capacity to discriminate between those units detected by the system which are terms and those which are not” NOISE “The rate between discarded candidates and accepted ones” Cabré, Estopà & Vivaldi (2001:53-56) 10
  • 11. AIMS OF AUTOMATIC TERMINOLOGY EXTRACTION ● Reduce noise (be accurate) ● Reduce silence (be complete) ● Allow for manual selection of terms and validation of candidate terms Heid (2001) ● “As usability is regarded as being fundamental for the acceptability of an interpreter-oriented tool, a terminology extraction system for interpreters must give priority to precision over recall.” Fantinuoli (2012: 49) 11
  • 13. 1. What tools are interpreters using for terminology extraction? 2. What are the strengths and weaknesses of these tools? 3. In which settings are terminology extraction tools useful? In which settings should they be avoided? 4. What does the terminology extraction process look like? 5. How does terminology extraction compare to other types of preparation? 6. In addition to the term itself, what should these tools extract? 7. What features would an ideal terminology extraction tool offer? RESEARCH QUESTIONS
  • 14. EXPLORATORY, MULTI-PHASE MIXED METHODS RESEARCH TO ▸ Map the field of terminology extraction tools for interpreters ▸ Develop an instrument to assess tools (Creswell & Clark 2006) SEMI-STRUCTURED IN-DEPTH INTERVIEWS ▸ Develop detailed descriptions, present multiple perspectives, describe process, understand a situation from the inside (Weiss, 1994). ▸ Answers “are rich and thick with qualitative data” (Turner, 2010, p. 756). ▸ Zoom™, Speechmatics™ ▸ Informed consent ▸ Anonymous INDUCTIVE THEMATIC ANALYSIS ▸ Transcribe interviews and inductively derive categories (Kvale, 1996) ▸ Coded with NVivo™ (CAQDAS program) RESEARCH DESIGN
  • 15. ▸ 10 respondents, all professional interpreters (2 women) ▸ Age 29 – 57 (μ = 42.2) ▸ Domiciled in Europe and North America ▸ 6 members of professional associations (60%) ▸ 2 staff interpreters (20%) ▸ Conference (100%), Media (10%), Court (10%) and Community (10%) interpreting ▸ Experience: 3 – 30 years (μ = 17.7) ▸ Experience using terminology extraction tools: 1 - 17 years (μ = 8.9) ▸ Translation, training, research, administration, voiceovers PARTICIPANTS
  • 16. PARTICIPANTS’ EXPERIENCE MANUAL SEMI- AUTOMATIC AUTOMATIC PERCENTAGE OF ASSIGNMENTS USED 0 - 100% (μ = 48.0%) 0 - 100% (μ = 18.9%) 0 - 100% (μ = 40.0%) NUMBER OF ASSIGNMENTS USED 0 - 840 (μ = 123.8) 0 - 150 (μ = 17.2) 0 - 600 (μ = 135.6)
  • 17. THIS IS A PILOT STUDY RESULTS CANNOT BE GENERALIZED, BUT DO AIM TO GIVE A GENERAL OVERVIEW OF TOOLS, EXPERIENCES AND EXPECTATIONS. PERCENTAGES ARE NOT STATISTICALLY SIGNIFICANT OR GENERALIZABLE.
  • 19. 19 HARDWARE USED Desktop (50%) Laptop (75%) Tablet (20%) Windows operating system (80%) MacOS (40%) iOS (20%) ▸ Some users utilize multiple devices
  • 20. 20 InterpretBank (60%) Interpreters’ Help (40%) SketchEngine (20%; 30% used or tested) Intragloss (10%; 40% used or tested) Wordsmith, Terminotix, Readdle Documents, GoodReader, GT4T, dtSearch, Thermostat, as well as an in-house tool at an international organization (10% each) ▸ Users work with or had tested multiple types of terminology extraction software TERMINOLOGY EXTRACTION SOFTWARE USED
  • 21. 21 Terminology management tools (InterpretBank, Interpreters’ Help, Interplex, MS Access): 100% Annotation tools (Readdle Documents, GoodReader, PDF Exchange Editor, Skim): 50% Terminology database (e.g. IATE): 50% Wikipedia: 40% Linguee: 40% Search Engines: 30% OTHER SOFTWARE USED
  • 23. TYPES OF TECHNOLOGY-ASSISTED TERMINOLOGY EXTRACTION 23 MANUAL User selects terms manually. Tool provides support, e.g., to: ▸ add terms to glossary ▸ look up translation ▸ help manage terms SEMI-AUTOMATIC User provides document(s). Tool suggests terms. User reviews and accepts them. AUTOMATIC User provides document(s). Tool suggests term candidates. Goldsmith (2018)
  • 36. 36 OTHER PREPARATION STRATEGIES Read documents (90%) Background reading (50%) Web research (50%) Memorize/drill terms (50%) Manual annotation (40%) Wikipedia (40%) Terminological research (30%) Gisting/text summarization (20%) Automatic translation; Concordancer; Build glossaries collaboratively; Read news; Read technical documents; Practice interpreting on similar topics (10%)
  • 38. 38 STRENGTHS OF TERMINOLOGY EXTRACTION TOOLS (1) FACILITATES PREPARATION Saves time (100%) Provides terminology despite time pressure (90%) Quick extraction from lengthy documents (60%) Less hassle / menial copying and pasting (30%) Automatic annotation of term (and translation) (20%) Better preparation (10%) CONSISTENCY/RELIABILITY Accurate/reliable results from automatic extraction (50%) Consistent preparation (20%)
  • 39. 39 STRENGTHS OF TERMINOLOGY EXTRACTION TOOLS (2) TERMINOLOGICAL PRECISION Automatically extract most important / “right” terms (50%) Automatically look up translations on other sites (40%) Automatically extract named entities (10%) Add stop words (10%) Search function (10%) ERGONOMICS Lightweight, portable, small footprint (30%)
  • 40. 40 STRENGTHS OF TERMINOLOGY EXTRACTION TOOLS (3) DISPLAY/INTERFACE Parallel scrolling (50%) Easy comparison of bilingual/multilingual texts (30%) Manual highlighting/annotation/bookmarking (20%) Easy to use; easy input of terms; visually appealing; filter/edit results (10% each) EXPORT/STORAGE Export candidates to database (40%) Back up/digitize glossaries (30%) Export in shareable format (20%) Reuse for later assignments (10%)
  • 41. 41 WEAKNESSES OF TERMINOLOGY EXTRACTION TOOLS (1) PREPARATION Incomplete preparation if only use term extraction (40%) Time-intensive (manual, copy/paste) (20%) Slow with large glossaries (10%) IMPORT/EXPORT/STORAGE Poor export/formatting of exported text (20%) Tool doesn’t recognize format (e.g. line breaks, images) (20%) Compatibility (Mac/PC, etc.) (10%) Poor import of documents/glossaries (10%) Export not provided (10%)
  • 42. 42 WEAKNESSES OF TERMINOLOGY EXTRACTION TOOLS (2) EXTRACTION Multilingual extraction not supported (50%) Too many terms extracted (50%) Results need cleaning up (30%) Too few/many words in term (20%) Noise (20%) Too few terms extracted (20%) Incomplete extraction (e.g. context missing) (10%) Tool reorders words (10%)
  • 43. 43 WEAKNESSES OF TERMINOLOGY EXTRACTION TOOLS (3) DISPLAY Poor/incomplete presentation of results (30%) Terminology entry lacks relevant fields (10%) Small screen size (tablet) (10%) CUSTOMIZATION Tools not designed for interpreters (10%) Software doesn’t know user’s individual needs (10%) COST Cost/subscription (20%)
  • 44. 44 SETTINGS WHERE EXTRACTION TOOLS PREFERRED 80% used extraction when documents available MANUAL Parallel texts (40%) New topic (30%) Few documents (30%) Time permitting (30%) Focus on collocations (10%) Only monolingual documents available (10%) AUTOMATIC Numerous/long documents (40%) For institutions (40%) Time pressure (40%) For hearings (20%) For automatic annotation when glossaries available (20%) Familiar subject matter (10%) All assignments (10%) When onsite (10%)
  • 45. 45 SETTINGS WHERE EXTRACTION TOOLS AVOIDED Limited / no materials available (50%) Documents not available in digital format (30%) Need to understand content (30%) Text too general (20%) Powerpoint (20%) Faster to read than extract (20%) Recurring meeting/familiar with terminology (10%) Confidentiality (10%) Multilingual documents not available (10%) Vague subject matter (10%) Very large / small glossary available (10%)
  • 46. 70%of respondents felt terminology extraction was more effective than other types of preparation 46 62.5% BUT ONLY 40% of respondents preferred terminology extraction over other types of preparation of respondents felt terminology extraction tools meet their needs
  • 47. 90%of respondents felt clients were not aware they used terminology extraction tools. Those who were aware reacted positively (20%) and found it professional (10%) 47 80%of respondents felt colleagues were curious about terminology extraction tools, although some mentioned uninterested colleagues (40%) who were averse to new approaches (20%) or unwilling to change their habits (20%)
  • 49. 49 THE IDEAL TOOL SHOULD EXTRACT Term (100%) Single and multi-word terms (100%) Context/examples (90%) Equivalents in other languages (70%) Source / source document (50%) Definition (40%) Frequencies (40%) Subject matter overview (40%) Collocations / phraseology (30%) Named entities; figures; domain; link to source (20%) Graphical information; images; hyponyms; semantic groupings (10%)
  • 50. 50 THE IDEAL TOOL ANNOTATION Allow manual annotation (70%) Highlight terms (60%) Highlight phraseology (60%) Print translations above extracted term (40%) Automatically annotate term occurrences from glossary (30%) Manually add sticky notes (30%) Highlight relevant content (20%) Annotations overview pane (20%) Bookmarks; Highlight phraseology; Highlight named entities (10% each)
  • 51. 51 THE IDEAL TOOL EXTRACTION/TRANSLATION Extract unknown terms (80%) Multilingual extraction available (80%) Statistical extraction/show frequencies (70%) Filter results (manually, chronologically, thematically, by frequency, by agenda item, etc.) (60%) Extract from multiple files (60%) Access external resources from within program (60%) Ignore stop words / decrease noise (60%) View parallel texts & manually extract equivalents (50%) Automatically rank most relevant terms (40%) No clean up necessary; access multiple termbases/dictionaries; search glossaries for extracted terms; tablet and/or stylus interface (30%) ...
  • 52. 52 THE IDEAL TOOL IMPORT Limited preprocessing / automatic conversion regardless of source file format (40%) Batch upload (30%) Import from parallel resources / in multiple languages (20%) Built-in webcrawler (10%) Import from your institutional calendar (10%) Flawless import (no errors with line breaks, etc.) (10%) Imports pre-existing glossaries (10%)
  • 53. 53 THE IDEAL TOOL EXPORT Multilingual export (60%) One-click import into database (50%) Export into widely used/compatible formats (30%) Export annotated text (20%) Print from tool (10%)
  • 54. 54 THE IDEAL TOOL FORMAT AND STORAGE FORMAT Cross-platform (50%) Software suite / integration with terminology management tool (50%) Compatible with mobile devices (30%) “Available on my operating system” (30%) Compatible with translation tools/databases (20%) Checks pre-existing glossaries to avoid duplicates (20%) STORAGE Local storage (40%) Offline to maintain confidentiality (30%) Cloud storage (30%)
  • 55. 55 THE IDEAL TOOL INTERFACE Link term to context (90%) View parallel texts side by side with synchronous scrolling (70%) Bilingual/multilingual term list (50%) Reliability marker/index (50%) Simple, uncluttered display (40%) Search within source documents (30%) Customize display (30%) Speech recognition interface (20%) Can manually annotate with stylus (20%) Clear color code (20%)/color code for fuzzy matches (20%) Search within/filter exported terms (20%) Extensive information available (20%) ...
  • 56. 56 THE IDEAL TOOL CUSTOMIZATION Configure number of terms extracted (50%) Configure working languages (40%) Customize external resources (40%) Custom results based on audience/domain/client (40%) Configure term length (n-gram) (30%) Customize display/user interface (30%) Knows interpreter’s preferences (20%); Designed for interpreters (20%) Tool knows interpreter’s background and adjusts accordingly (20%) Configure frequency threshold (20%) Learns from human postprocessing; preconfigure database / fields; configure domain; tool knows where to find information in document (10%)
  • 58. 58 CONCLUSIONS (1) Interpreters regularly use manual, semi-automatic and automatic terminology extraction tools. The terminology extraction process differs for every interpreter, although it tends to include document collection, extraction, glossary building, and possible annotation. Interpreters prefer different approaches (manual vs. [semi-]automatic) in different settings, and avoid terminology extraction when documents are not available or digitized or when they need an in-depth understanding of content and have time to read the entire text.
  • 59. 59 Terminology extraction saves time and can lead to reliable results and terminological precision. Terminology extraction alone may be insufficient. Most respondents felt terminology extraction was more effective than other types of preparation. Most respondents felt that terminology extraction tools did not meet their needs. CONCLUSIONS (2)
  • 60. 60 Interpreters use a wide variety of terminology extraction software, but few terminology extraction tools are designed for interpreters, and the perfect tool doesn’t exist yet. Minimally, the ideal tool should extract unknown terms, context, and translations and offer multilingual extraction, filtering of results, access to terminological resources, multilingual export, manual annotation, parallel scrolling, bilingual/multilingual term lists and significant customization. CONCLUSIONS (3)
  • 61. 61 Phase 2: Survey to rank the features of ideal tools and make recommendations to designers Phase 3: Use weighted rankings to assess existing tools and make recommendations to practitioners. FUTURE WORK