The Times, They Are A-Changin'

Characterizing Post-Publication Changes to Online News


The focus of this work is to study the extent to which post-publication changes happen to news articles. We collect articles over a period of 9 months from news publishers of varying popularity and political biases and show that 165k out of 600k articles exhibit some post-publication changes. We also leverage Natural Language Processing to measure the semantics of these changes, such as whether a change alters the meaning of the paragraph it occurs in - which, in 22% of cases, it does.

Full Paper PDF

Bibtex for citation

Source Code

Parsers: You can find the Python code used to parse the article HTML from the various crawled articles here.

Mapping of changes to categories: You can download the aforementioned CSVs that we created from this folder.

R Notebooks used for analysis and graphs: Finally, a collection of notebooks we used to arrive at various metrics and create the graphs present in the paper can be found here.


This work was done at Stony Brook University, at PragSec Lab.
For any queries or questions, contact Chris at: