- With social media users growing steadily, the worldwide quantity of digital data doubles every year.
- Skillful mining of Big Data is now an essential dimension of counterterrorism, cybersecurity, drug policy enforcement, border security, and intelligence gathering.
- Actionable analysis requires crossing traditional boundaries of geography, language, and culture.
- Law enforcement seeks predictive capabilities to protect civilians from the potential danger of violence during demonstrations and periods of civil unrest.
- Going beyond sentiment analysis, law enforcement agencies increasingly seek predictive insights to protect civilians during demonstrations and periods of unrest.
- Open source intelligence (OSINT) is expanding to incorporate images, audio, video, and emoticons.
Does your agency have a reliable way to confirm identity at border crossings? Are you accurately screening persons of interest and financial institutions against ALL worldwide watch lists? Are you able to identify a potential new threat even when that entity doesn’t appear in an existing database?
Conquering these challenges requires the same fundamental solution—high-quality, multilingual Text Analytics. Unlike years past when “global” web mining meant translations-to-English cobbled with English analytics, today’s cutting edge solution applies linguistic analysis to analyzing structured and unstructured text (news, reviews, blogs, tweets, and posts), to ensure that each word is understood in its native context. It’s the only way to be certain results are not skewed by subtle errors in slang, syntax, or spelling.
The tools you need to successfully mine any data feed exist. To stay on top of security threats, your OSINT system needs to : Return high quality results across critical or “conflict zone” languages.
- Offer robust features, functionality, and scaling.
- Integrate easily into your existing infrastructure.
Your solution should also meet both current requirements and future needs.
Ten Questions to Ask About Text Analytics in OSINT Solutions
- How well do you handle short text such as tweets, SMS, & social network status?
Tweets have traditionally been more difficult to analyze because there is less context to work from, and they often include slang, abbreviations, and emoticons. But many solutions are now capable of excellent Twitter analysis—identifying the language and finding mentions of people, places, and companies.
- Can my system effectively analyze text where jargon and specialized vocabulary exists?
If your organization uses domain-specific vocabulary, your solution should have algorithms that can be trained for greater accuracy over time. So, look for text analytics that work both right out of the box but also can adapt to meet the specific and evolving requirements of any domain.
- How do I guarantee high quality results across all languages?
Any analytics you use is only as good as the linguistic analysis foundation it’s based on. Machine translation garbles meaning. You need a solution where each language is understood natively. This linguistic approach does not find related words based on how they appear but rather, what each word means within its written context. It’s the best way currently available to ensure all data is interpreted correctly. This is also a reason you may prefer a company whose core competency is text analytics rather than one where analyzing Big Data is just one aspect of a wider suite of products.
- How many languages should my OSINT solution accommodate?
Your mission scope will determine the languages you need at any given time. But rather than adding languages piecemeal over time, you may be best positioned to quickly meet demands and minimize integration headaches by using a vendor known for the quality of its multilingual capability, across many languages.
- How well does your system respond to the idiosyncrasies of search in different languages?
Comprehensive and reliable search results depend on native understanding of each language. Minor variations in spelling and characters exist in all languages (color vs. colour). The complexities of Chinese, Arabic, and Japanese pose greater challenges. Ask if your search engine accommodate all these variations.
- How well does your system track names in multiple languages?
National security requires that names are recognized and tracked internationally. That’s why multilingual capability is so critical. Names need to be correctly identified across languages and scripts, regardless of nicknames, usage, spacing, or misspellings. Your solution must reliably identify “Abdul Rashid”, “Abdal Ar-Rasheed”, “عَبْد اَلرَّشِيد”, or “アブダル・ラシード” as the same person.
- If multiple products are required, how well do they work together?
A suite of text analytics products designed to seamlessly integrate provides greater synergy and consistency, and higher quality analytics than if you combine standalone products.
- Will your solution help me identify new potential threats?
Advanced text analytics CAN discover relationships between persons of interest and others, which is one way to discover new and upcoming threats. Any time a new entity is identified, your solution should be able to flag and monitor it for the future.
- Which solution is best at sentiment analysis?
A relatively new aspect of text analytics and the current darling of social media analytics, sentiment analysis is improving all the time as software gets better at understanding the subtleties of context, sarcasm, and human error. The best solution will be one with algorithms that can be improve over time to suit your needs.
- What’s best—asking my developers to build a solution, finding an open source option, or buying a commercial product?
Your best solution depends upon what you want and how soon you need it. Thoroughly research all options because pros and cons vary depending on your situation. Consider the engineering costs associated with integration. Will you need resources and support to troubleshoot? Is there sufficient natural language processing expertise in-house? How critical is text analytics to your mission?
If you require multilingual capability, this adds significant complexity, requiring either expert knowledge or analytics for each language.
Ultimately, your choice will rest on the quantity and quality of available in-house resources and your specific time requirements.