Blockchain Content Research @ COMSYS

Blockchains primarily enable credible accounting of digital events, e.g., money transfers in cryptocurrencies. However, beyond this original purpose, blockchains also irrevocably record arbitrary data, ranging from short messages to pictures. This does not come without risk for users as each participant has to locally replicate the complete blockchain, particularly including potentially harmful content. We provide the first systematic analysis of the benefits and threats of arbitrary blockchain content. Our analysis shows that certain content, e.g., private information, politically banned statements or illegal pornography, can render the mere possession of a blockchain illegal. Based on these insights, we conduct a thorough quantitative and qualitative analysis of unintended content on Bitcoin’s blockchain. Although most data originates from benign extensions to Bitcoin’s protocol, our analysis reveals more than 1600 files on the blockchain, over 99 % of which are texts or images. Among these files there is clearly objectionable content such as links to illegal pornography, which is distributed to all Bitcoin participants. With our analysis, we thus highlight the importance for future blockchain designs to address the possibility of unintended data insertion and protect blockchain users accordingly.

On this page, we provide background information on our work and answer frequently asked questions (work in progress). For a full understanding of our work, please consult our paper presented at Financial Cryptography 2018 as well as our discussion of potential countermeasures to be presented at the IEEE Workshop on Blockchain Technologies and Applications 2018.

Brief summary available! If you do not have much time but want to get an unbiased summary of our work, please read the excellent, unsensational, and quite accurate summary of our Financial Cryptography 2018 paper in Adrian Colyer's The Morning Paper.

Publications

Information on our methodology, our results, and possible countermeasures can be found in the following scientific publications:

Initial idea: I Don't Want That Content! On the Risks of Exploiting Bitcoin's Blockchain as a Content Store (ACM CCS 2016 Poster) [citation]

@inproceedings{MHH+16,
author = {Matzutt, Roman and Hohlfeld, Oliver and Henze, Martin and Rawiel, Robin and Ziegeldorf, Jan Henrik and Wehrle, Klaus},
title = {{POSTER: I Don't Want That Content! On the Risks of Exploiting Bitcoin's Blockchain as a Content Store}},
booktitle = {ACM CCS 2016 Poster},
publisher = {ACM},
year = {2016},
doi = {10.1145/2976749.2989059},
}

Quantitative analysis: A Quantitative Analysis of the Impact of Arbitrary Blockchain Content on Bitcoin (Financial Crypto 2018) [citation]

@inproceedings{MHH+18,
author = {Matzutt, Roman and Hiller, Jens and Henze, Martin and Ziegeldorf, Jan Henrik and M{\"u}llmann, Dirk and Hohlfeld, Oliver and Wehrle, Klaus},
title = {{A Quantitative Analysis of the Impact of Arbitrary Blockchain Content on Bitcoin}},
booktitle = {Financial Crypto 2018},
publisher = {Springer},
year = {2018},
}

Discussion of countermeasures: Thwarting Unwanted Blockchain Content Insertion (IEEE IC2E 2018 Workshop BTA) [citation]

@inproceedings{MHZ+18,
author = {Matzutt, Roman and Henze, Martin and Ziegeldorf, Jan Henrik and Hiller, Jens and Wehrle, Klaus},
title = {{Thwarting Unwanted Blockchain Content Insertion}},
booktitle = {IEEE IC2E 2018 Workshop BTA},
publisher = {IEEE},
year = {2018},
}

Frequently Asked Questions

Are you the first to point out the possibility to store illegal content in blockchains?
The possibility to store illegal content in the blockchain and the potential legal consequences have been discussed earlier, e.g., by Steve Hargreaves and Stacy Cowley at CNNTech and INTERPOL.
What are the original contributions of your research?
We conducted a systematic analysis of content insertion mechanisms, analyzed which content is currently stored in the blockchain, and revisited its impact with respect to legal rules. As a result, we present the content insertion methods together with their costs for insertion. Furthermore, we categorized content encoded in the blockchain (see below).
Did you find actual child abuse content in the blockchain?
No. Yet, the blockchain encodes links that apparently point to child abuse content. Partly, the targets of these links are located in the "Dark Web" reachable via the anonymization network Tor. Furthermore, it contains an image depicting mild nudity of a young woman (in an online forum this image is claimed to show child pornography, albeit this claim cannot be verified)
Which categories of content did you find?
We found the following categories:
- Copyright violations: (i) a text book, (ii) the original blockchain paper, (iii) white papers, (iv) leaked cryptographic keys, (v) an illegal prime used for DVD cracking
- Malware: A proof of concept for malware insertion
- Privacy violations: (i) wedding pictures, (ii) group image with pseudonyms, (iii) public chat logs, (iv) emails, (v) forum posts discussing Bitcoin, (vi) doxing (disclosure of personal information such as phone numbers, addresses, bank accounts, passwords, and online identities)
- Politically sensitive content: (i) Wikileaks cablegate data, (ii) news article on pro-democratic demonstrations in Hong Kong
- Condemned content: (i) 5 files with mildly pornographic content, (ii) 2 backups of links to child abuse content, (iii) an image depicting mild nudity of a young woman (in an online forum this image is claimed to show child pornography, albeit this claim cannot be verified)
Is content insertion a general problem for blockchains? Does it affect Bitcoin alternatives such Ethereum or Monero?
The problem persists in any blockchain where users can insert arbitrary data by design or where they can freely choose their identifiers (e.g., Bitcoin addresses). In Ethereum, images have apparently already been inserted as well as a comment to our paper suggests. Bitcoin-like cryptocurrencies (especially direct Bitcoin forks) such as Litecoin are by design susceptible as they reuse Bitcoin's mechanics.
We did not yet investigate more privacy-aware blockchain systems such as Monero or the upcoming Mimblewimble. Such blockchains need further investivation with respect to how easy identifiers that appear on the blockchain can be manipulated.
Can insertion of content to blockchains be prevented?
In our follow-up paper that will be presented at the IEEE Workshop on Blockchain Technologies and Applications in April, we investigate whether there are countermeasures against content insertion. Our findings are that content inserters can always insert some bytes per transaction by brute-forcing identifiers. Hence, the problem can only be mitigated but not entirely eliminated.
While there are technical countermeasures against (easy) content insertion, we believe the only viable countermeasure that can potentially find its way into Bitcoin would be to introduce mandatory minimum fees that penalize transactions with many outputs. This disincentivizes insertion of large transactions, which are especially well-suited for content insertion. Once the community reaches consensus on the exact fee model, it is easily deployable via one fork.
We are currently considering to work on a BIP (Bitcoin Improvement Proposal) to help introduce such a countermeasure.
How should the community deal with this risk?
We hope that the community starts to consider the problem of content insertion. We noticed individual posts and comments that discussed the phenomenon before our study, but the possible impact has been underestimated.
As a short-term solution until more evolved countermeasures such as an advanced fee model are deployed, miners could reject "suspicious" transactions. We consider transactions "suspicious" if they have many outputs (at least 50, corresponding to ~1 KB of insertable data) that only spend very small amounts. These transactions are not likely to be economically feasible transactions. Still such a solution bears the risk to reject legitimate transactions such that more evolved solutions are needed.
What is the cost for content insertion? Could a malicious individual poison a blockchain to make it politically impossible to distribute it?
A malicious individual can insert arbitrary content into Bitcoin's blockchain, e.g., an image of Nelson Mandela that is already contained in the blockchain (roughly 21 KB in size) can be inserted at a cost of 380 USD today considering a market price of 8400 USD per Bitcoin. While the costs increase for larger documents, poisoning the blockchain is not prohibitively expensive today.
We cannot judge whether authorities would ban Bitcoin based on inserted material, but the theoretical possibility should be considered, especially in very oppressive and non-transparent jurisdictions.
Do all entities in the Bitcoin ecosystem need to download the full blockchain (and possibly illegal content within it)?
Users often transfer Bitcoins via services that create transactions for them and hand these transactions over to miners for inclusion in Bitcoin's blockchain. In this case, users do not need to download the full blockchain to their disk.
However, to prevent modification or deletion of transactional data encoded in the blockchain, a sufficiently large and mutually independent set of entities must verify the correctness of the blockchain. Furthermore, newly joining users must be enabled to verify Bitcoin's complete history. To this end, full nodes store the complete history of Bitcoin's blockchain. While individual users can refrain from operating a full node, and thus would not be affected by the threat outlined in our paper, a critical mass of nodes that verify the blockchain is important for its security. Our findings suggest that operating such a full node can become problematic in certain jurisdictions once illegal content is irrevocably stored on the blockchain. We consider it a future threat that full node operators may become punishable.
Do you know who put the data onto the blockchain or do you have insights on their motivations?
We did not attempt to de-anonymize the persons and can only speculate on their incentives. The probably most sensible explanation is that they want to leverage or exploit the property that data cannot be removed from the blockchain.
Should future blockchain designs prevent insertion of non financial data to protect users from potential liability?
With privacy-motivated cryptocurrencies we see tendencies to limit the impact of non-financial content. Monero and Mimblewimble started a discussion on the topic, but did not follow it yet.
However, if we aim for very general use cases, e.g. Ethereum as distributed programmable database or completely new blockchain applications, it becomes hard to distinguish benign from possibly problematic content. We hope that our findings motivate a new line of research for such use cases to protect the users actively contributing to the network.

Media Coverage

Here we provide an uncommented list of media outlets featuring our work and (in parentheses) discussing incidents related to our work (if you are aware of additional articles on our work, please let us know). Reference of an article here does not imply any endorsement or recommendation of its content. In fact, we strongly disagree with the dramatic sensationalism in a majority of these reports, which mainly focus on information that has been known for five years (see above). We highlight major media outlets in bold.

2019-02-08: CryptoNewsZ
2019-02-07: futurezone (German), Tokenpost
2019-02-06: BBC
2019-02-05: Coingeek
2019-02-04: (The Next Web), (Cripto Noticias (Spanish))
2018-08-08: RiskNET (German)
2018-05-26: Heise c't (also print, German)
2018-04-09: Global Finance Magazine
2018-04-01: The Currency Analytics
2018-03-29: WIRED
2018-03-28: CryptoNews Review, Investopedia
2018-03-27: ACS, Coindesk
2018-03-26: Business Insider (Dutch), Computer BILD (German)
2018-03-25: Pittsburgh Post-Gazette
2018-03-24: TokensTree
2018-03-23: Complex, GodmodeTrader (German), Süddeutsche Zeitung (also print, German), Wallstreet Online (German)
2018-03-22: BILD (German), BitcoinBlog.de (German), Frankfurter Allgemeine Zeitung (Print, German), Cointelegraph, GameStar (German), Golem (German), Herald Sun, Krone (German), news.com.au, Newsweek, n-tv (German), Salzburger Nachrichten (German), ScienceAlert, The Irish Times, The Washington Post, T-Online (German)
2018-03-21: Australian Broadcasting Corporation, Basler Zeitung (German), BBC News, Beebom, Bitcoin News, CBS Local (Video), Der Standard (German), Forbes, Fortune, Futurezone (German), heise online (German), Indiatimes, Infosecurity Magazine, Mashable, MDR Radio (German), Motherboard/VICE (German), Naked Security by Sophos, Newser, New York Post, Nine.com.au, n-tv (German), Rappler, RTTNews, Sputnik International, The Daily Dot, The Independent, The Next Web, WinFuture (German), WIRED Germany (German), ZDNet (German)
2018-03-20: CBC Radio: As It Happens (Podcast, interview starts at 36:55), Cointelegraph, Engadget Deutschland (German), Fortune, Gizmodo, NewsBTC, PCMag UK, RT, Sky News, The Guardian, The Telegraph
2018-03-19: Boing Boing, O'Reilly Four Short Links, The Morning Paper, The Outline, The Register

Contact

In case of any questions or comments, please send an email to: blockchain [ät] comsys.rwth-aachen.de

Postal and visiting address

Chair of Communication and Distributed Systems - COMSYS - Informatik 4
RWTH Aachen University
Ahornstraße 55 - building E3
52074 Aachen
Germany

Partners / Research Institutes

Funding

This work has been funded by the German Federal Ministry of Education and Research (BMBF) under funding reference number 16KIS0443. The responsibility for the content of this publication lies with the authors.