# Søren Vind

## About

I am a PhD student in the Algorithms, Logic and Graphs Section in the Department of Applied Mathematics and Computer Science at the Technical University of Denmark, from where I also received my BSc and MSc.

I am associated with the
Managed Video as a Service
Advanced Technology Foundation project.

The goal is to research and develop compressed data structures for indexing massive amounts of meta data that supports efficient queries.

My supervisors are Philip Bille and Inge Li Gørtz.

I work closely together with fellow PhD students Patrick Hagge Cording, Frederik Rye Skjoldjensen and Hjalte Wedel Vildhøj.

### Contact

DTU Compute

Building 322, Office 008

DK-2800 Kongens Lyngby

Denmark

Email: sovi@dtu.dk

## Research

My research interests are within fundamental algorithms and data structure problems, stringology, compression, computational geometry and distributed computing.

### Publications

with Roberto Grossi, Giulia Menconi, Nadia Pisanti and Roberto Trani.*(To appear in) FSTTCS 2014: Output-Sensitive Pattern Extraction in Sequences*

AbstractGenomic Analysis, Plagiarism Detection, Data Mining, Intrusion Detection, Spam Fighting and Time Series Analysis are just some examples of applications where extraction of recurring patterns in sequences of objects is one of the main computational challenges.

Several notions of patterns exist, and many share the common idea of strictly specifying some parts of the pattern and to*don't care*about the remaining parts. Since the number of patterns can be exponential in the length of the sequences,*pattern extraction*focuses on statistically relevant patterns, where any attempt to further refine or extend them causes a loss of significant information (number of occurrences).

We address the problem of extracting maximal patterns with at most $k$ don't care symbols and at least $q$ occurrences. We give a simple algorithm with the first known output-sensitive bounds for pattern extraction.with Philip Bille and Inge Li Gørtz.*(To appear in) ISM 2014: Indexing Motion Detection Data for Surveillance Video*

AbstractWe show how to compactly index video data to support fast*motion detection*queries. A query specifies an area $A$ in the video, a time interval $T$ and two thresholds: a minimum pixel value change $v$ and minimal percentage $p$ of $A$ affected by said change. The answer to a query is the list of timestamps in $T$ where at least $p\%$ of the pixels in $A$ has changed by at least $v$ values.

Our results show that by building a small index, we can support queries with a speedup of two or three orders of magnitude compared to motion detection without an index. For high resolution video, the index size is about $20\%$ of the compressed video size.with Roberto Grossi.*SWAT 2014: Colored Range Searching in Linear Space*

AbstractIn*colored range searching*, we are given a set of $n$ colored points in $d \geq 2$ dimensions to store, and want to support orthogonal range queries taking colors into account. In the*colored range counting*problem, a query must report the number of distinct colors found in the query range, while an answer to the*colored range reporting*problem must report the distinct colors in the query range.

We give the first linear space data structure for both problems in two dimensions ($d=2$) with $o(n)$ worst case query time. We also give the first data structure obtaining almost-linear space usage and $o(n)$ worst case query time for points in $d > 2$ dimensions. Finally, we present the first dynamic solution to both counting and reporting with $o(n)$ query time for $d \geq 2$ and $d \geq 3$ dimensions, respectively.with Philip Bille, Inge Li Gørtz, and Hjalte Wedel Vildhøj.*Theory of Computing Systems: String Indexing for Patterns with Wildcards*

Arxiv version, Abstract, Conference version:We consider the problem of indexing a string $t$ of length $n$ to report the occurrences of a query pattern $p$ containing $m$ characters and $j$ wildcards. Let $occ$ be the number of occurrences of $p$ in $t$, and $\sigma$ the size of the alphabet. We obtain the following results.- A linear space index with query time $O(m+\sigma^j \log \log n + occ)$. This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time $\Theta(jn)$ in the worst case.
- An index with query time $O(m+j+occ)$ using space $O(\sigma^{k^2} n \log^k \log n)$, where $k$ is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time.
- A time-space trade-off, generalizing the index by Cole et al. [STOC 2004].

with Philip Bille, Inge Li Gørtz, and Hjalte Wedel Vildhøj.*SWAT 2012: String Indexing for Patterns with Wildcards*

with Philip Bille, Patrick Hagge Cording, Inge Li Gørtz, Benjamin Sach, and Hjalte Wedel Vildhøj.*WADS 2013: Fingerprints in Compressed Strings*

Full version (draft), AbstractThe Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string $S$ of size $N$ compressed by a context-free grammar of size $n$ that answers fingerprint queries. That is, given indices $i$ and $j$, the answer to a query is the fingerprint of the substring $S[i,j]$. We present the first $O(n)$ space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get $O(\log N)$ query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get $O(\log \log N)$ query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension problem in query time $O(\log N\log \ell)$ and $O(\log \ell \log\log \ell + \log\log N)$ for SLPs and Linear SLPs, respectively. Here, $\ell$ denotes the length of the LCE.with Hjalte Wedel Vildhøj.*Master's Thesis: String Indexing for Patterns with Wildcards*

Supervised by Philip Bille and Inge Li Gørtz.

Technical University of Denmark, August 8, 2011.

### Experiments

*Motion Detection Indexing*

There is source code available on Github for testing the performance of an index for motion detection data from surveillance video cameras.

## Conferences

- FSTTCS 2014

New Delhi, India

December 15-17 2014 - ISM 2014

Taichung, Taiwan

December 10-12 2014 - ICALP 2014

IT University of Copenhagen, Denmark

July 7-11 2014 - SWAT 2014

Presented*Colored Range Searching in Linear Space*Technical University of Denmark, Denmark

July 2-4 2014 - ARCO Workshop

Malmö University, Sweden

April 25 2014 - WADS 2013

Presented*Fingerprints in Compressed Strings*University of Western Ontario, Canada

August 12-14 2013 - CPM 2013

Bad Herrenalb, Germany

June 17-19 2013 - ARCO Workshop

Gave a talk on*String Matching with Fingerprints*

SDU Odense, Denmark

April 5 2013 - Storage & Indexing of Massive Data

The Royal Society at Chicheley Hall, Buckinghamshire, UK

February 7-8 2013 - ARCO Workshop

IT University, Denmark

November 15 2012 - CPM 2012 / SWAT 2012

University of Helsinki, Finland

July 3-6 2012

## Stays Abroad

- External research visit with Benjamin Sach

Gave a talk on*Algorithms in Practice*

University of Bristol, UK

August 31 - October 3 2014 - External research visit with Roberto Grossi

Gave a talk on*Fingerprints in Compressed Strings*

University of Pisa, Italy

January 7 - April 10 2014 - Invited visit hosted by Raphael Clifford

University of Bristol, UK

November 15-22 2013 - Invited visit hosted by Benjamin Sach

University of Warwick, UK

February 4-6 2013 - Erasmus Intensive Programme: DOSSEE 2011

Universidad de Alcalá, Spain

March 2011 - Exchange student: BSc/MSc

California Institute of Technology, Pasadena, USA

September - December 2009

## Summer Schools

- EADS Summer School on Hashing: Theory and Applications

University of Copenhagen, Denmark

July 14-17 2013 - MADALGO Summer School on Data Structures

Aarhus University, Denmark

August 18-22 2013 - ADFOCS 2013

Saarbrücken, Germany

August 5-9 2013 - MADALGO Summer School on Algorithms for Modern Parallel and Distributed Models

Aarhus University, Denmark

August 20-23 2012 - MADALGO Summer School on Geometric Data Structures

Aarhus University, Denmark

August 16-19 2010

## Teaching

- 02122 Software Technology Project

Supervisor for two groups, Spring 2013

Search Engine Project - 02282 Algorithms for Massive Data Sets

Teaching Assistant, Spring 2013

Course webpage - 02105 Algorithms and Data Structures 1

Teaching Assistant, Spring 2013 - 02102 Introductory Programming

Teaching Assistant, Spring 2010 - 02326 Algorithms and Data Structures

Teaching Assistant, Spring 2009

## Miscellaneous

- I helped organise the SWAT 2014 conference
- I maintain the section website
- My public LinkedIn profile