【記事更新】私のブックマーク「都市空間の情報処理─データセットの世界動向」 | 人工知能学会 (The Japanese Society for Artificial Intelligence)

私のブックマーク

都市空間の情報処理─データセットの世界動向

関本　義秀（東京大学空間情報科学研究センター，東京大学デジタル空間社会連携研究機構）

1．は　じ　め　に

　近年，情報処理や人工知能などの分野でも，社会実装の一端として，都市空間全体を対象にした研究・ビジネスをしている人も増えているのではないでしょうか？　巷ではスマートシティ，スーパーシティなどの言葉も出てきていますし，都市のディジタルツインというキーワードも日常的になってきました．特に機械学習・深層学習の進展により，都市のような実世界を自動認識し，大変な労力をかけて行っているさまざまな都市管理を少しずつ自動化による DXを進めていく実践的な取組みも進んできています．
　しかし，都市の構成物は複雑であるため，これらは簡単ではなく，また，行政や実ビジネスで使おうとすると一定以上の精度が求められることも多いため，うまく自動化しきれない処理工程もあったり，分野によってかなり取組みが異なることもあります．いずれにしても，機械学習のためにはさまざまなデータが必要であり，こうしたデータの多くが個別の企業や大学の競争力の源泉である一方で，我々自身が普段オープンなデータセットに助けられて研究・ビジネスなどが行えていることも多々あります．研究やビジネス全体の進展と社会への還元という意味では，それぞれのビジネスモデルの範囲でなるべく積極的にデータセットの共有が進められるべきだと考えています．また， COVID-19や世界の分断などで不安要素が増す中で，日本が世界に貢献できることなども考えていく必要があります．研究者であれば，オリジナリティのあるデータを公開して，国際学会を通じて，データチャレンジのイベントを積極的に開催し，仲間の輪を広げていくことなども一つのアプローチだと思います．
　そこで，本稿では，都市に関する主要な地物に焦点を当て，それぞれの分野で世界的にどのような「都市の教師データセット」が提供されているかを俯瞰し，今後，取り組むべき研究・ビジネスなどを展望します．なお，今回はそういう意味では，データセットに焦点を当てていますので，その他，都市空間の情報処理そのもの基礎的なことは割愛しています．また，本稿の中で（＊）のデータセットは何らかの申請・許諾が必要であり，それ以外はオープンな形でダウンロードが可能となっています．ただ，いずれの場合も多くの場合は研究用には使えるものの商業目的の場合は一報を入れることになっているタイプのライセンスが多いので留意してください．

2．土　地　利　用

　まず最初に，土地利用の分野を紹介します．土地利用は都市の開発そのものや農業生産の収穫予測など都市の根幹に関わる部分であるため，この分野は割と古くから航空写真やリモートセンシングによる衛星画像を使って取り組まれています．ここでは，機械学習が普及する以前の 1980～ 90年代から画像のピクセル値によるクラスタリングで土地利用の分類を行う教師なし分類があり，もちろん分析ソフトウェア上で既知の土地利用データを読み込ませて分類する，教師あり分類も徐々に増え，現在のような機械学習につながっているといえます．特に航空写真では見えない可視光以外のセンサデータを搭載している衛星画像は，波長帯に応じた物質の反射特性によりさまざまなことがわかりますが，可視光画像そのものの高解像度化のほうが判別精度そのものに影響することが多いようで，表1のように高解像度化，多クラス化が進んでいます．そういう意味では近年では衛星画像のバリエーションも増え，超解像の技術なども適用し，少数の高解像度画像で学習し，広域の低解像度画像からでもある程度の高精度な土地利用推定を行うような研究もあります．

表1　土地利用に関するデータセット

データセット名	提供主体	提供年	分類数やアノテーション数	ソースデータ概略
UC Merced Land Use Dataset	UC Merced	2010	21 classes	100 aerial images for each class, measures 256×256 pixels, the pixel resolution is 1 foot
RSSCN7	Wuhan Univ.	2015	7 classes	2.8 K remote sensing images
SAT-6 airborne datasets	Louisiana State Univ. & Nasa	2015	6 classes	405 K image patches each of size 28×28
SIRI-WHU: google	Wuhan Univ.	2016	12 classes	200 images for each classes, each image measures 200×200 pixels, with a 2-m spatial resolution
SIRI-WHU: USGS	Wuhan Univ.	2016	4 classes	The large image measures 10000×9000 pixels, with a 2 ft spatial resolution
RSI-CB	Central South Univ.	2017	35 classes	24 K images， 256×256 pixel sizes with 0.3～ 3 m spatial resolutions
Dstl Satellite Imagery Feature Detection（Dstl Satellite Imagery Feature Detection），Kaggle	Dstl	2017	10 classes	57 images, 1×1 km, 3/16-band Worldview 3 imagery（ 0.3 m-7.5 m spatial resolutions）
DLRSD	Wuhan Univ.	2018	21 classes	100 images per class with 256×256 pixels size
LandCoverNet	Radiant Earth Foundation	2018	5 classes	1.9 K images, 256×256 pixels in V1.0 spanning 66 tiles of Sentinel-2
DroneDeploy (https://competitions.codalab.org/competitions/18468)	DroneDeploy	2019	7 classes	A number of aerial scenes captured from drones. Each scene has a ground resolution of 10 cm per pixel
Slovenia Land Cover Classification	Sinergise	2019	10 classes	940 EOPatches of the size 500×500 pixels at 10 m resolution
SEN12MS	TUM	2019	33 classes	180 K patch triplets of corresponding Sentinel-1 dual-pol SAR data, Sentinel-2 multi-spectral images, and MODIS-derived land cover maps
LandCover.ai	linuxpo	2020	3 classes	33 orthophotos with 25 cm per pixel resolution（～ 9000×9500 px） , 8 orthophotos with 50 cm per pixel resolution（～ 4200×4700 px）
BDCI 2020（＊）	BDCI	2020	7 classes	140 K JPG images at a resolution of 2 m/ pixel and a size of 256×256
Gaofen Image Dataset（GID）	Wuhan Univ.	2020	5 and 15 classes （2 versions）	The large-scale classifcation set contains 150 pixel-level annotated GF-2 images, and the fne classifcation set is composed of 30000 multi-scale image patches coupled with 10 pixel-level annotated GF-2 images.
CLRS	Central South Univ.	2020	25 classes	15 K remote sensing images, image size is 256×256, The resolution of the images ranges from 0.26 m to 8.85 m
SenseEarth Classify（＊）	Sense Earth	2020	8 classes with 28 sub classes （51 different categories in total）	70 K remote sensing images
Multi-View Datasets: AiRound	Federal Univ. of Minas Gerais	2020	11 classes	11 K images

3．建　　　物

　次に，建物についてです．建物は人々の生活やビジネスの拠点となるため，昔から例えば固定資産税の把握のためなどに航空写真撮影で建物異同の判別などを行ってきましたが，最近では高精度な地図作成や三次元化，あるいは空家問題や不動産情報など，高精度な建物データへのニーズが高まっています．その一方で，建物データの表現レベルもいろいろありますが，まずは航空写真や衛星画像から建物の二次元ポリゴン形状が自動抽出できることが一定のマイルストンであり，表2に示すようなさまざまなデータセットがこれまでも公開されてきています．しかしこれもまだ実用に耐え得る精度とは言いづらく，空間的な解像度や対象エリアが異なる教師画像で学習したモデルを他の画像に適用すると，途端に精度が下がってしまうことが多々あります．また，実用的には建物の具体的な属性（例えば，一般建物か事業所かの区分や築年数，あるいは木造・鉄筋などの建物構造）なども併せて推定できるとよりデータとしての価値が上がりますが，それにはより近接の画像（例えば，地上の車載からの撮影画像など）が必要です．
　表2のような二次元情報以外ですと，複数の二次元画像から三次元構造の推定を試みる SfM（ Structure from Motion）向けに， Washington Univ.と Microsoft社がノートルダム大聖堂の画像を公開した Photo Tourism Dataset（2006）や， Cornell Univ.が 200シーンの Landmarkの距離画像約 10万枚を公開した MegaDepth（2018）などもあります．また，最近，日本では国土交通省がプラトーという都市の三次元化プロジェクト内で 56都市の都市計画区域内の建物三次元データをG空間情報センター内で 2021年 3月よりデータ公開を始めており，こうしたものも学習のデータとして使えるようになっていく可能性があります．
　また，不動産情報として使えるレベルの詳細な建物情報は不動産企業の取組みに大きく依存するので，国によってかなり異なりますが，日本ではかなり豊富です．例えば，約8300万枚の賃貸物件の外観・内装や約 515万枚の間取りの画像を物件情報とアノテーションしたLifulデータセット（＊）（2015）や，約 533万件の賃貸・売買物件の月次賃料（緯度経度付き，構造・築年代含む）Lifulデータセット（＊）（2017），約 1万件のホットペッパービューティーデータ（店舗名，住所，データで店舗データや口コミが付与）があるリクルートデータセット（＊）（2014），アットホームが提供する全国の不動産の賃料または価格，物件概要（面積，間取り，構造，築年）や立地（所在地，最寄沿線・駅，徒歩分，一部種目の緯度・経度），諸設備などを含むアットホームデータセット（＊）（2019），施設データ（約 2.9万施設）とそのレビューデータ（約 656万レビュー）を含む楽天データセット（＊）（2021）などがあります．それ以外の詳細は，清田陽司氏から本誌 Vol. 33, No. 5, pp. 662-668で紹介があったのでそちらをご覧ください．

表2　建物に関するデータセット

データセット名	提供主体	提供年	分類数やアノテーション数	ソースデータ概略
SZTAKI-INRIA Building Detection Benchmark（＊）	MTA SZTAKI	2012	665 buildings	9 satellite or aerial images
Inria Aerial Image Labeling Dataset（＊）	INRIA	2017	2 classes (building and nonbuilding）	Satellite images（810km²）
SpaceNet	Maxar	2018	2 classes (building and road）, 11 M buildings	Satellite images
WHU building dataset	Wuhan Univ.	2019	22 K buildings	Aerial images with 0.075 m spatial resolution
Open Cities AI Challenge Dataset	UN Global Facility for Disaster Reduction and Recovery（GFDRR）	2020	790 K buildings	400 km², Drone image
LandCover.ai	Univ. of Warsaw	2020	4 classes, 12 K buildings	Aerial images in Poland （216 km²）

4．道　　　路

　次は道路です．道路分野は自動運転のトレンドもあり，近年急速に進んでいる分野です．興味深いのはこの分野では大手のカーメーカも主要データの一部を一般公開している点で，これはオープンイノベーションの流れということもできます．表3に主なデータセットをまとめたが，全体的にはレーンの認識から始まり，徐々に道路空間全体のセマンティックセグメンテーションや画像からの深度（距離）計測に移ってきているといえます．また，興味深いのはドライビングシミュレータのような運転の真値がある程度わかっているゲームベースで生成される擬似画像から推定するものです．もちろん現実の画像とは複雑さは違いますが，ある程度のモデルは擬似画像から構築し，その後は実世界のデータから fne tuningなどで高精度化することができます．
　また，自動運転だけではなく，道路管理の効率化の観点から道路標識や舗装の損傷などの道路付属物を自動検出するような取組みあります．

表3　道路に関するデータセット

【Lane marker】
データセット名	提供主体	提供年	分類数やアノテーション数	ソースデータ概略
Caltech Lanes Dataset	Caltech	2008	Streets and labeled lane	1.4 K images
VPGNet（＊）	KAIST	2017	Lane and road marker	20 K images
TuSimple	TuSimple	2017	Lane marker	6 K images for highway
CULane	The Chinese Univ. of HongKong	2017	Lane marker	133 K images
【Road scene】
RobotCar（＊）	Oxford Univ.	2016	Road space without annotation	600 K images taken by left, right and rear camera with longer term changes such as construction and roadworks, LiDAR dataset for depth prediction
BDD100K（＊）	UC Barkley, Cornell Univ., UC Santiago, Element	2018	Lane marker, road surface, car and person	100 K frames
Apollo Scape Dataset	Baidu Research	2018	Semantic labelling（35 classes）, Lane marker labelling（35 classes）, 2D instances segmentation（8 classes）, 3D car instance labelling（70 K cars）	160 K dense semantics 3D point cloud images, 100 hours stereo driving videos
The Lane Marker Dataset（＊）	BOSCH	2019	Lane marker and baseline segmentation	100 K annotated images
OSV Dataset	Wuhan Univ.	2019	5 classes（lights, cars, traffc signs, crosswalks, crosswalk warning lines）, 5.6 K objects in total	1.2 K annotated images OSV: Omnidirectional Street-View
nuScenes（＊）	Motional	2019	3D box with semantic	40 K images with camera, LiDAR, and radar
DDAD	Toyota Research Institute	2020	Depth image	16 K images with LiDAR dataset for Japanese Roads DDAD: Dense Depth for Autonomous Driving
【Synthetic data】
Playing for data	TU Darmstadt & intel labs	2016	19 road object segmentation	25 K densely labelled frames split into 10 parts from the game GTA（Ground Theft Auto）
Apollo Synthetic Dataset	Baidu Apollo	2019	24 road object segmentation	273 K distinct scenes from Unity engine
3D Lane Synthetic Dataset	Baidu Apollo	2020	Lane marker	6 K images
【Road attachment】
Tsinghua-Tencent dataset	Tsinghua Univ.	2021	Traffc signboard（80 K）	16 K high-resolution images
Road Damage Dataset 2018	The Univ. of Tokyo	2018	8 road damage classes such as linear crack, alligator crack, pothole, white line blur	9 K in-vehicle smartphone images

5．車　　　両

　次に，道路の一部ともいえますが，車両です．自動運転における前方車両の距離計測ではレーザ，レーダの利用が多いと思われますが，より低廉な機器という意味では，画像からの距離推計として，表4の KITTIデータセットが多く使われています．しかし，自動運転の観点だけではなく，本来は表 4のほかのデータセットのように，地域全体の交通量の把握などのニーズもあります． GPS情報の収集ではカーメーカや携帯事業者以外が行うことは難しい状況ですので，そういう意味では動きながら車両の台数を計測し，全体の交通量を把握していくような研究も重要かもしれません．

表4　車両に関するデータセット

データセット名	提供主体	提供年	分類数やアノテーション数	ソースデータ概略
KITTI Benchmark	Karlsruhe Institute of Technology（KIT）	2013	80 K vehicles, 8 classes（car, van, truck, pedestrian, sitting person, cyclist, tram and misc）	15 K images, In-vehicle camera
UCAS-AOD	The Univ. of Science and Technology of China（USTC）	2014	2.8 K vehicles & 3.2 K planes	310 aerial images（vehicles） & 600 images（planes）
COWC	Lawrence Livermore National Laboratory	2016	32 K vehicle bounding boxes	32 aerial images（0.15 m resolution）（COWC: Cars Overhead With Context）
DLR-MVDA	German Aerospace Center（DLR）	2018	3.5 K vehicles	20 optical images is taken at a height of 1000 meters above ground.（MVDA: Multi-class Vehicle Detection and Orientation in Aerial Imagery）

6．人　　　々

　また，人々の分布状況も重要です．もちろん， GPSなどによる位置情報サービス由来の個人の位置情報を集約したものが正確ですが，携帯事業者やアプリのサービサ以外が収集することは難しく，事業者そのものも個人情報の関係で現実的には提供にハードルがあります．しかし，こうした携帯の位置情報を有効活用しようという機運が出始めた 2010年前後は Nokiaによる Mobile Data Challengeや， OrangeによるD4D（Data for development）などが，本人の同意のうえで集計済みの基地局データの提供などを行い，世界的にもインパクトがありました．また，表5のデータは現在提供が行われているもので，必ずしも機械学習のためのものとは限りませんが，それぞれデータ提供にはそれなりなハードルがあったことが想定され，貴重な取組みといえます．

表5　個人のトリップ情報ベースで計測したデータセット

データセット名	提供主体	提供年	分類数やアノテーション数	ソースデータ概略
T-Drive	Microsoft Research Asia	2008	7 days with 10 K taxi users, No label	GPS data
Geolife	Microsoft Research Asia	2008 （2016 updated）	178 users, 17 K trajectories for 3 years, Transportation mode（Walk, bike, bus, car&taxi, train, airplane, other）	GPS data
BerlinMOD	Univ. of Hagen	2011	292 K trips from 2 K vehicles, Car type, route choice（benchmark for simulation models）	GPS data
Travel time	Uber Movement	2018	Travel time of each zone in 51 cities	Aggregated from GPS data to zone level（GPS data is not included in the dataset）
RideAustin	RideAustin （Nonproft corporation）	2018	3 M trips of ride sharing including origin and destination spatio-temporal data	GPS data（but not included in the dataset）
Pickups in NYC	Uber	2018	4.5 M in 2014, 14.3 M in 2015 taxi trip destination, route choice	GPS data（GPS data of one taxi company is included but other three companies are not）
PFLOW dataset（＊）	The Univ. of Tokyo	2008～	7 M estimated trajectory data in 36 cities	Person trip survey data based on paper questionnaire in several countries
OpenPFLOW dataset	The Univ. of Tokyo	2017	500 K estimated trajectory data in Tokyo metropolitan area	Several statistic data including open person trip survey data

　そうした個人情報の問題を避けるために，画像を用いて人々の状況を計測する試みもあります．表6がそれらですが，大きく分けると， CCTVのような固定カメラによるもの，車載カメラで動きながらのもの，ヘリやドローンなど，上空からのもの， SNSに投稿されたクラウドソーシング的に収集されたものなどバリエーションは多く，今後も増えていくものと思われます．

表6　人々を画像ベースで計測したデータセット

データセット名	提供主体	提供年	分類数やアノテーション数	ソースデータ概略
INRIA Person Dataset	INRIA	2005	Pedestrians	2 K images from a varied set of personal photos
UCSD Anomaly Detection Dataset	Univ. of California	2008	Bikers, skaters, small carts, and people walking	98 videos from fxed CCTV
Robust Multi-Person Tracking from Mobile Platforms	ETH	2008	Pedestrians	In-vehicle camera, 8 videos with 13～ 14 FPS
Daimler Pedestrian Detection Benchmark Dataset / Segmentation Benchmark Dataset	Daimler	2009/ 2013	72 K Pedestrians / 500 images（ground, building, vehicle, pedestrian, sky）	In-vehicle camera
Tsinghua-Daimler Cyclist Detection Benchmark Dataset	Daimler	2016	32 K Cyclists	In-vehicle camera
UCF50 – Action Recognition Data / UCF-QNRF – A Large Crowd Counting Data Set	Univ. of Central Florida	2013/ 2018	63 K / 1 251 K Pedestrians	Images collected mainly from the FLICKR
WorldExpo’10 Crowd Counting Dataset（＊）	Shanghai Jiao Tong Univ.	2015	225 K Pedestrians	108 surveillance cameras
ShanghaiTech Dataset	ShanghaiTech Univ.	2016	330 K Pedestrians	1198 crowd images collected from the Internet and personal camera
Stanford Drone Dataset	Stanford Univ.	2016	Pedestrians, bicyclists, skateboarders, cars, buses, and golf carts （10 K trajectories）	Drone camera
TokyoHawkeye	The Univ. of Tokyo	2020	120 K Pedestrians	Static images from helicopter in 10 different locations
Motion Dataset, Perception Dataset	Waymo	2021	Vehicles, Pedestrians, Cyclists（10.8 M trajectories）/ Vehicles, Pedestrians, Cyclists, Signs（12.6 M 3 D and 11.8 M 2 D bounding box trajectories）	High-resolution sensor data collected by autonomous vehicles

7．都　市　全　体

　最後に，都市全体に関するデータセットです．これは冒頭の土地利用と近い部分もありますが，もう少し都市を構成する各構造物を判別できるレベルであり，表7に示すように，一般の衛星画像というよりは，かなり高解像度の航空写真か地上からの画像が主体となっています．そうなると，建物で述べたように自ずと三次元的な都市のディジタルツインの方向に向かい，今後，より高い精度を競っていくようなホットな領域となりそうです．

表7　都市全体を計測したデータセット

データセット名	提供主体	提供年	分類数やアノテーション数	ソースデータ概略
Place Plus（＊）	MIT media lab	2013	Street score, Street change	100 K images（56 cities）
SYNTHIA	Computer Vision Center	2016	13 classes（sky, building, road, sidewalk, fence, vegetation, pole, car, sign, pedestrian, cyclist, lane-marking, misc.）	50 K images, photo-realistic frames rendered from a virtual city SYHTHIA: SYNTHetic collection of Imagery and Annotations
ADE20k（＊）	MIT & Toronto Univ.	2017	3688 classes for indoor and outdoor scene（For Semantic Segmentation, 150 classes are used）	27 K images from SUN and Places Database
AID	Wuhan Univ.	2017	30 classes	10 K images, AID（Aerial Image Dataset）
Cityscapes dataset（5000 annotated images & 20000 coarse annotations）	TU Darmstadt	2020	20 K coarse annotated objects, 30 classes	5 K images, In-vehicle camera for 50 cities
Holicity	UC Berkeley	2020	3D cad dataset for surface segments（6 classes such as sky or nothing, buildings, roads, terrains, trees, others）, depth and normal estimation	6.3 K images, High-resolution aerial image
Mapillary Vistas Dataset（＊）	Mapillary	2021	124 semantic object categories, 100 instance-specifcally annotated categories	25 K high-resolution images, Pedestrian’s camera

8．お　わ　り　に

　本稿では都市空間のデータセットについて，どちらかと言えば個別分野で進められていたものを俯瞰的に見通すことを試みました．冗長になってしまった部分もあるものの，まとめ始めると自分自身でもいろいろと参考になる部分が多く，皆様の今後の研究展開の一助となれば幸いです．

謝　辞

　本稿は，研究室の多くのメンバに協力をいただきました． Ashutoshu Kumar, Shenglong Chen, Go Sato, Hiroya Maeda, Takehiro Kashiyama, Yoshiki Ogawa, Yanbo Pang, Santiago Garcia, Toshikazu Seto, Zhehui Yang（順不同）には改めて感謝致します．

私のブックマーク

都市空間の情報処理─データセットの世界動向

1．は じ め に

2．土 地 利 用

3．建 物

4．道 路

5．車 両

6．人 々

7．都 市 全 体

8．お わ り に

謝 辞