セキュリティアナリストのためのElastic。パート1：文字列の検索。

作成者

Adam Swan

脅威ハンティングエンジニアリングリード

[post-views]

3月 02, 2020 · 18 分で読めます

目的：

Elasticはそのソリューションの速度と拡張性によりサイバーセキュリティ分野での足場を固めつつあり、より多くの新しいElasticユーザーが増加すると予想しています。これらのユーザーは、他のプラットフォームやSIEMでの経験から得た直感を持ってElasticにアプローチするでしょう。しかし、この直感はElasticで数回検索した後に直接的に挑戦されることがよくあります。このシリーズの目的は、Elasticの独自性についてセキュリティアナリストを迅速にキャッチアップさせることです。この投稿では、Elasticでの文字列データに対する適切な検索を構築するためのガイドを読者に提供します。分析済み テキスト および非分析済み キーワード データ型が文字列ベースのデータ検索に与える影響を誤解すると、誤解を招く結果をもたらします。この投稿を読むことで、分析の意図に沿った文字列検索を行うためのスキルをより適切に身に付けることができます。

概要：

始める前に
使用しているデータ型はどれですか？
違いの概要
違い1：トークナイズ＆用語
違い2：大文字小文字の区別
違い3：シンボルマッチング

始める前に：

Lucene

このブログ投稿ではLuceneを使用します。KQLは正規表現をまだサポートしておらず、必要です。

用語：データ型、マッピング、およびアナライザー：

Elasticのインデックスにデータがどのように保存されるかについて議論する際、マッピング、データ型、およびアナライザーという用語に精通しておく必要があります。

データ型 – 値が保存/インデックス化される「型」、「データの型」、「データ型」。データ型の例としては：文字列、ブール型、整数、IPがあります。文字列は「テキスト」または「キーワード」のデータ型として保存/インデックス化されます。
マッピング – これは、各フィールドをデータ型に割り当てる（マッピングする）設定です。 get mapping APIを介してアクセス可能です。「マッピングを取得」すると、データ型にマッピングされたフィールドが返されます。
アナライザー – 文字列データが保存/インデックス化される前に、値は保存および検索を最適化するために前処理されます。アナライザーは文字列に対する検索を高速化するのに役立ちます。

文字列の保存方法：

文字列には2つの主要なデータ型があります： キーワード and テキスト.

キーワード – 型の文字列は、その生の値のまま保存されます。アナライザーは適用されません。 キーワード are stored as their raw value. No analyzer is applied.
テキスト – テキスト 型の文字列は分析されます。デフォルトで最も一般的なアナライザーは、標準（テキスト）アナライザーです。この投稿では、｢テキスト｣データ型について説明する際には、標準アナライザーが使用されているデータ型について言及しています。他にもアナライザーがあり、カスタムアナライザーも可能です。 テキスト datatype with the standard analyzer. There are other analyzers, and custom analyzers are possible.

使用しているデータ型はどれですか？

あなたのElasticインスタンスが、文字列用の両方の テキスト and キーワード データ型を使用している可能性が非常に高いです。ただし、Elastic Common Schema (ECS) およびWinlogbeat は主に キーワード データ型を使用しています。

仮にECSを使用していても、管理者はマッピングをカスタマイズできます！特定のフィールドがどのようにマッピングされているかを知るには、Elasticインスタンスにクエリを送信することが必要です。これをするためには、get field mapping API、あるいは、get mapping APIを使用できます。定期的に検索を行ったりコンテンツを構築したりしているフィールドの最新のマッピングを把握しておくことは良い実践です。マッピングは変わることがありますが、フィールド名は同じまま残ることがあります。言い換えると、今日の キーワード フィールドが明日には テキスト フィールドになるかもしれません。

キーワード and テキスト の間の注目すべき違いは、以下のセクションで詳述されています。さらに検索結果に影響を与える各違いは、それぞれのセクションで詳しく探ります。

違いの概要

このセクションを一読して、これらのタイプがいつ一致するのかすぐに理解できるとは思っていません。各違いがそれぞれのセクションで詳述されます。表の各例も、その動作を説明するセクション内の表に配置されています。

違い

以下の表は、データ型の主要な違いを簡単に概観したものです。

違い	標準（テキスト）	キーワード
トークナイズ	用語に分割（トークナイズ）、元の値は失われるが、より速い	トークナイズされず、元の値は保持される
大文字小文字の区別	大文字小文字を区別しない、大文字小文字の区別ができるクエリ不可	大文字小文字を区別する、正規表現を使用して大文字小文字を区別しないクエリ可能
シンボル	一般的に、非英数字は保存されません。ただし、特定のコンテキストで非英数字を保持します	非英数字を保持 / シンボルを保持

動作の違い

以下の表は、タイプが検索動作に与える影響の実世界の例を示しています。

例の値	クエリ	テキストマッチ	キーワードマッチ
Powershell.exe –encoded TvqQAAMA	process.args:encoded	Yes	No
Powershell.exe –encoded TvqQAAMA	process.args:/.[Ee][Nn][Cc][Oo][Dd][Ee][Dd]./	Yes	Yes
Powershell.exe –encoded TvqQAAMA	process.args:Powershell.exeTvq*	No	Yes
TVqQAAMA	process.args:TVqQAAMA	Yes	Yes
TVqQAAMA	process.args:tvqqaama	Yes	No
cmd.exe	process.name:cmd.exe	Yes	Yes
CmD.ExE	process.name:cmd.exe	Yes	No
CmD.ExE	process.name:/[Cc][Mm][Dd].[Ee][Xx][Ee]/	Yes	Yes
\$	process.args:\\$*	No	Yes
\C$WindowsSystem32	process.args:C$\	Yes	Yes

_

違い1：アナライザー、トークナイズ＆用語

違い

違い	テキスト（標準アナライザー）	キーワード
トークナイズ	用語に分割（トークナイズ済み）	分析されず、トークナイズされず。元の値保持。

なぜ…？

The テキスト データ型／標準アナライザーは、文字列をチャンク（トークン）に分割するトークナイズを使用します。これらのトークンは、単語の境界（つまりスペース）、句読点などに基づいています。

トークナイズにより、包含やワイルドカードを使用せずに単一の用語に基づいたマッチングが可能になります。たとえば、「Elastic」で検索すると データ型の文字列に含まれる the テキスト 「Elasticでの検索は簡単です」 が一致します。他のSIEMがワイルドカードや「contains」ロジックに大きく依存するのとは異なります。

しかし、トークナイズは単語間のワイルドカードで中断されます。たとえば、 「*searching*Elastic*」 は標準分析された文字列「Elasticでの検索は簡単です」と一致しません。 注意：これは近接で対処できますが、順序は維持されません。例えば、「searching Elastic」~1 は、 「searching with Elastic」や 「Elastic with searching」と一致します。 セキュリティで正確な一致と単語間のワイルドカード使用が必要なことが多々あります。これが、

データ型がECSで事実上のデータ型となった理由の1つです。性能を犠牲にすることで、より精密な検索が可能になります。 キーワード 例

process.args: ”Powershell.exe –encoded”

例の値	クエリ	テキストマッチ	キーワードマッチ
process.args: ”Powershell.exe –encoded”	小文字で完全に保存されているため、大文字小文字を区別しません。ケースセンシティブなクエリは不可能です。	Yes	Yes
Powershell.exe –encoded TvqQAAMA	process.args:/.[Ee][Nn][Cc][Oo][Dd][Ee][Dd]./	Yes	Yes
Powershell.exe –encoded TvqQAAMA	process.args:encoded	Yes	No
Powershell.exe –encoded TvqQAAMA	小文字で完全に保存されているため、大文字小文字を区別しません。ケースセンシティブなクエリは不可能です。	No	Yes

_

違い2：大文字小文字の区別

違い

違い	テキスト（標準アナライザー）	キーワード
大文字小文字の区別	大文字小文字を区別します。正規表現を使用して大文字小文字を区別しないクエリが可能です。	大文字小文字の区別問題は、セキュリティアナリストとしてのElasticの動作を理解する際の最も大きな原因の1つです。これは特に

なぜ…？

データ型 (ECSクラウドの皆さん、こんにちは) に当てはまります。 キーワード ログ中のケースのずれた1文字が、フィールドに対する不適切に構成されたクエリをバイパスさせることがあります。特定のデータを攻撃者が制御する部分が キーワード に到達する場合 (Windows 4688 & 4104イベントを考えてみてください)、正規表現を使用して大文字小文字を区別しないようにする必要があります! さらに、Elasticは、ケースの異なる1文字によってドキュメントがわずかに見落とされた場合でも警告を出さないため、意図したよりも少ない結果または多すぎる結果を得ることが、セキュリティアナリストの混乱の主な原因です。

以下は 「PoWeRsHeLl」に対する基本的なマッチング例です。クエリが一致するのを防ぐためには、ほんの1文字のケースのずれだけで済むことがわかります。 はい（全ケースに一致）process.args:/[Pp][Oo][Ww][Ee][Rr][Ss][Hh][Ee][Ll][Ll]/

例の値	クエリ	テキストマッチ	キーワードマッチ
はい（全ケースに一致）	Kibanaでの実際の例を示します。下の画像では、	Yes	No
はい（全ケースに一致）	はい（全ケースに一致）	yes	yes
はい（全ケースに一致）	Kibanaでの実際の例を示します。下の画像では、	yes	型フィールド「process.args」に「windows」文字列がクエリされました。知らないアナリストにとって、これが十分に思えるかもしれません…42件の結果が返されました。ところが、「windows」を含むドキュメントがほしいと考えている場合、それは間違いです。この検索はケースセンシティブのため、「Windows」は一致しません。

ケースセンシティブ検索からの限られた結果 キーワード 以下のクエリでは、正規表現を用いて「windows」を検索することで、以前は「欠けていた」567件の結果が返されます！

最大の結果

希望として、私たちが

型を使っていて、正規表現を使用しない場合、「powershell」の正確な一致を超えたバリエーションを見逃すことになることを理解していますように。データが攻撃者によって制御される場合には、(正規表現をサポートしていないKQLを使用して)この状況を回避しないようにしてください。

注：Base64など、キーワードフィールドで大文字小文字を区別したい場合もあります。 キーワード 正規表現の文字セットを使用して、任意のクエリを大文字小文字を区別しないようにできます。例は以下の通りです。/[Cc]:\[Ww][Ii][Nn][Dd][Oo][Ww][Ss]\[Ss][Yy][Ss][Tt][Ee][Mm]32\.*/C:windowssystem32*

すべてのシンボルは、データが入力されたフィールド全体がそのまま保持されるため、	なぜ？
cmd.exe	すべてのシンボルは、データが入力されたフィールド全体がそのまま保持されるため、
なぜ？	すべてのシンボルは、データが入力されたフィールド全体がそのまま保持されるため、

process.args: ”Powershell.exe –encoded”

例の値	クエリ	テキストマッチ	キーワードマッチ
TVqQAAMA	なぜ？	Yes	Yes
TVqQAAMA	すべてのシンボルは、データが入力されたフィールド全体がそのまま保持されるため、	Yes	No
cmd.exe	process.name:cmd.exe	Yes	Yes
CmD.ExE	process.name:cmd.exe	Yes	No
CmD.ExE	process.name:/[Cc][Mm][Dd].[Ee][Xx][Ee]/	Yes	Yes

_

違い3：シンボルマッチング

違い

違い	テキスト（標準アナライザー）	キーワード
シンボル	一般的に、非英数字は保存されません。ただし、特定のコンテキストで非英数字を保持します	非英数字を保持 / シンボルを保持

タイプでは保持されます。（注を参照）ただし、標準アナライザーでは、一般的なルールとして、シンボルは保持されません。これはアナライザーが全体的な単語マッチング用に作られており、シンボルは言葉ではないためです。言い換えれば標準アナライザーでは、シンボルは（大半が）保存されません。ですので、

シンボルでマッチするつもりがある場合、 キーワード データ型を使用するのが最適です。ただし、テキストフィールドしかなく、一連のシンボルにマッチしないといけない場合はどうにもなりません。しかし、標準アナライザーでは、シンボルが保持されるコンテキストがあります。たとえば、「cmd.exe」などの用語ではピリオドが保持されます。著者は標準アナライザーでシンボルがどのように保持されるかを理解する最も簡単な方法は、analyze APIでテストデータを実行することだとしばしば見出しました。 キーワード 結論 Elasticは強力なツールです。しかし、それは誤解を招くことがあります。我々が少しでも知識を武器にし、文字列ベースのデータに対して自信を持って検索する力を持つことを願っています。

すべてElastic用のコンテンツ作成に助言が必要だと感じる場合、SOC Primeの

process.args: ”Powershell.exe –encoded”

例の値	クエリ	テキストマッチ	キーワードマッチ
\$	process.args:\\$*	No	Yes
\C$WindowsSystem32	process.args:C$\	Yes	Yes
cmd.exe	process.name:cmd.exe	Yes	Yes

_

将来の投稿

には、私たちのお勧めのElastic設定を使用したデテクションコンテンツが満載されています。将来の投稿 Elasticをセキュリティアナリストとして使用する基礎およびかなりの応用を探るさらなるブログ投稿をお見逃しなく。

検索に関する追加リソース：

このシリーズは、分析者の痛点に焦点を当てており、構文の詳細には入り込まずに進めています。Elasticの提供する

Lucene構文

に関する詳細なドキュメントが存在します。また、以下の注目すべきいくつかの質の高いコミュニティ製チートシートがあります：特に McAndreのFlorian RothとThomas Patzkeのメタ： and 公開日 – 2020年3月

最終更新 – 3月12日

著者 – Adam Swan (@acalarch) Nate Guagenti (@neu5ron) の協力により

使用Elasticバージョン：7.5.2

例に使用したログ：.

https://github.com/sbousseaden/EVTX-ATTACK-SAMPLES

Published – March 2020

Last Updated – 12 March

Authors – Adam Swan (@acalarch) with help from Nate Guagenti (@neu5ron)

Elastic Version Used: 7.5.2

Logs Used In Examples: https://github.com/sbousseaden/EVTX-ATTACK-SAMPLES

この記事は役に立ちましたか？

いいねと共有をお願いします。

SOCプライムのDetection as Codeプラットフォームに参加してください ビジネスに最も関連する脅威の可視性を向上させるために。開始をお手伝いし、即時の価値を提供するために、今すぐSOCプライムの専門家とミーティングを予約してください。

無料で参加ミーティングを予約

SOCプライムプラットフォーム, ブログ — 3 分で読めます

データプレーンへのルール展開

Steven Edwards

ナレッジビット, ブログ — 1 分で読めます

Elasticにおけるビルディングブロックルールの活用

Adam Swan

Name	Descripiton
PHPSESSID	Preserves user session state across page requests. Cookie generated by applications based on the PHP language. This is a general purpose identifier used to maintain user session variables. It is normally a random generated number, how it is used can be specific to the site, but a good example is maintaining a logged-in status for a user between pages.
sp_i	Used to store information about authenticated User.
sp_r	Used to store information about authenticated User.
sp_a	Used to store information about authenticated User.

Name	Descripiton
tuuid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
tuuid_last_update	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
um	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
umeh	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
na_sc_x	Used by the social sharing platform AddThis to keep a record of parts of the site that has been visited in order to recommend other parts of the site.
APID	Collects anonymous data related to the user's visits to the website.
IDSYNC	Collects anonymous data related to the user's visits to the website.
_cc_aud	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_cc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_dc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_id	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
dpm	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
acs	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
clid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
KRTBCOOKIE_#	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PUBMDCID	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PugT	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
ssi	Registers a unique ID that identifies a returning user's device. The ID is used for targeted ads.
_tmid	Registers a unique ID that identifies the user's device upon return visits. The ID is used to target ads in video clips.
wam-sync	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
wui	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
AFFICHE_W	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
B	Collects anonymous data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The registered data is used to categorise the users' interest and demographical profiles with the purpose of customising the website content depending on the visitor.
1P_JAR	These cookies are used to gather website statistics, and track conversion rates.
APISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
HSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
NID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SAPISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SIDCC	Security cookie to protect users data from unauthorised access.
SSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
__utmx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.
__utmxx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.

Name	Descripiton
_hjid	Hotjar cookie. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInSample	This cookie is associated with web analytics functionality and services from Hot Jar, a Malta based company. It uniquely identifies a visitor during a single browser session and indicates they are included in an audience sample.
intercom-id-[xxx]	This cookie is used by Intercom as a session so that users can continue a chat as they move through the site.
intercom-session-[xxx]	Used to keeping track of sessions and remember logins and conversations.
demdex	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
CookieConsent	Stores the user's cookie consent state for the current domain.
__cfduid	Used by the content network, Cloudflare, to identify trusted web traffic.
ss	These cookies enable the website to provide enhanced functionality and personalisation . They may be set by us or by third party providers whose services we have added to our pages. These services may include the Live Chat facility, Contact Us form(s), the Product Quotation forms and submission process, and the Email Newsletter sign up functionality .

Name	Descripiton
_ga	This cookie name is asssociated with Google Universal Analytics - which is a significant update to Google's more commonly used analytics service. This cookie is used to distinguish unique users by assigning a randomly generated number as a client identifier. It is included in each page. Registers a unique ID that is used to generate statistical data on how the visitor uses the website. request in a site and used to calculate visitor, session and campaign data for the sites analytics reports. By default it is set to expire after 2 years, although this is customisable by website owners.
_gat	Used by Google Analytics to throttle request rate. This cookie name is associated with Google Universal Analytics, according to documentation it is used to throttle the request rate - limiting the collection of data on high traffic sites. It expires after 10 minutes.
_gid	This cookie name is asssociated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited. Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
IDE	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
r/collect	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
test_cookie	Used to check if the user's browser supports cookies.
collect	Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across devices and marketing channels.
ads/user-lists/#	These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites.
c	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
khaos	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
put_#	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpb	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpx	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
tap.php	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.

セキュリティアナリストのためのElastic。パート1：文字列の検索。

目的：