The Times, They Are A-Changin'
Characterizing Post-Publication Changes to Online News
Website
An alpha version of a website based on this project is now live! Feel free to
check it out
Introduction
The focus of this work is to study the extent to which post-publication changes happen to news articles. We collect articles over a period of 9 months from news publishers of varying popularity and political biases and show that 165k out of 600k articles exhibit some post-publication changes. We also leverage Natural Language Processing to measure the semantics of these changes, such as whether a change alters the meaning of the paragraph it occurs in - which, in 22% of cases, it does.
Full Paper PDF
Bibtex for citation
Source Code
Parsers: You can find the Python code used to parse the article
HTML from the various crawled articles here.
Mapping of changes to categories: You can download the
aforementioned CSVs that we created from this
folder.
R Notebooks used for analysis and graphs: Finally, a collection
of notebooks we used to arrive at various metrics and create the graphs
present in the paper can be found here.
About
This work was done at Stony Brook University, at PragSec
Lab.
For any queries or questions, contact Chris at:
ctsoukaladel@cs.stonybrook.edu