14. June 2017, by Štefan Emrich
The consequences of that what happens in parliament are far reaching. Therefore the Austrian Parliament published all stenographic minutes of its meetings. Unfortunately, what is being discussed in Parliament, does only seldom reach a broad audience. Partially this is because a lof of the talks are political skirmishes. But partially also due to the fact that there is simply so much being discussed and media does not have the resources to cover and analyze it all.
Analysis of large sets of information has increasingly being outsourced to computers and algorithms. But analysis of spoken (or written) content is much more complex that that of numbers. We accepted the challenge to analyze the sentiment of Austrian representatives – and have come up with the first results.
Development as a team
The minutes of the Austrian Parliament are published on published on its homepage. The initiative “Open Parliament” is working on making this information more readily available to the public. Our goal was to analyze these minutes in depth. For this a broad set of skills and know-how is needed. Within the research project VALiD, a cooperation between the University of Applied Science St.Pölten, the University of Applied the Vienna University and the drahtwarenhandlung, just this skills are combined. The aim of this project is to analyze information over the course of time. For this we developed Natural Language Programming (NLP) algorithms which we then applied onto the parliamentary speeches of the past 20 years. This gives us the ability to not only count “words” (e.g. which party has the most interjections, which representative is talking the most, when are what topics being discussed) but to analyze content.
Together with political scientists (also from Vienna University) we developed algorithms which are trained to identify negativity. I.e. classify sentences according to their negativity between 0 (not negative) and 1 (very negative). This process, called sentiment analysis, is a very complex one, as language is much more complicated to understand than numbers (e.g. sarcasm). In addition the German language is even more complex than the English – for which the majority of NLP theory exists. Also we had to find out that even humans cannot reliably agree on what “negative” is. Frequently different individuals rated the same sentence with not negative and very negative.
But soon we could show that out classifier worked not only stable, but also produced meaningful results. In the meantime these have been presented at the ECPR-confernce (European Consortium for Political Research) in Nottingham with a second paper already accepted for the ICA in San Diego (International Communication Association).
The first theses we tested were very simple. The idea was to use this thesis for verification of the algorithm. For example that Ewald Stadler’s speeches, a notorious politician, will rank among the negative ones. Or that opposition parties will be more negative than the government. For both our classifying algorithm showed positive results. Ewald Stalder did not only make it into the Top 10 od negative politicians. He made it into the Top 10 twice (note: he changed party affiliation during his career).
The difference between opposition and government is clearly visible
Next it was about testing more complex theses. One was the change of negativity over the course of time. The thesis was that negativity increases prior to elections and decreases during the period of governance. This was as well supported by the algorithm. Hence it was now to use the algorithm to really analyze the data.
Negativity during periods of governance
One of the findings was that the speeches of the SPÖ (social democratic party, left wing) significantly increased in negativity during the time as opposition party (2000 – 2008), only to drop to pre-opposition levels once again in government. The FPÖ (freedom party, right wing), in contrast, did not diminish in negativity while in government during that time – but had a sharp increase when again in opposition.
Negativity of speeches by the SPÖ (red) and the FPÖ (blue)
But computational analysis can go even further. We compared the debates of the last decades and could show that it’s predominantly the “big topics” which cause emotions (among politicians). Of the 25 most negative debates 15 can be assigned to only three topics: the financial crisis, the Eurofighter-purchasing and the early elections surrounding the FPÖ-ÖVP-BZÖ-coalition.
Even though this are only the first results which we could produce with our classification algorithms, they appear to be very well founded. And they hold quite a potential. In the next months we will further evaluate the algorithms and expand them. Our plan is to add other sentiment to the classifier and also to conduct more analysis. And, as we set out to develop a tool which serves public interests, we also intend to publish the findings in cooperation with media.
If you ike this project, we would be happy if you share it with friends. And if you have any ideas for further analysis of the data, we would love to hear from you!
This article was first published by die drahtwarenhandlung and re-printed here with permission.
16. March 2017, by Štefan Emrich
Last week the 5th consortial meeting of the VALiD research project took place, hosted by the FH Joanneum in Graz on Thursday. The key topics of the meeting were revolving around machine-learning algorithms used for natural language processing (NLP).
One core question was how the project’s comprehensive database of political speeches could be better explored with sentiment analysis. Sentiment analysis, the classification of text based on the sentiment of the content (polite, incivil, ironic, …), is a science on its own. To make matters worse, the majority of the research is focused on the English language, which is one of the simpler languages. Hence the methods and approaches for German, which is more complex, are technically far less mature.
Another hot topic was if and how far the algorithms and the data used to train these, are subject to gender & diversity bias. While this topic has reached the science community, its full implications are still not being grasped to the full extent. Here is a highly recommended read on this increasingly important topic!
Due to the nature of the meeting (i.e. the whole research team getting a thorough update and discussing the next steps) there were no detailed solutions elaborated. Nevertheless the discussions spawned useful inputs for further work. We are looking forward to implement and test these ideas, and first results seem to be very promising. Time will tell how reliable these results of the sentiment analysis are – and if and how the methods can then be used in practice.
We will keep you updated 😉
30. January 2017, by Alexander Rind
Elena Rudkowsky gave her PhD proposal talk at University of Vienna in December 2016. Her PhD proposal and her doctoral thesis agreement got accepted by the university. She’ll work on story discovery in text documents to support investigative data journalists during the next two years (until the end of VALiD’s project duration in Dec. 2018). Her research questions are:
- How can journalists be enabled to search for concepts instead of terms?
- How can journalists safe time when tagging documents?
- How can journalists explore corpus-related named entity networks that are linked to additional relevant information from public named entity databases?
2. December 2016, by Christina Niederer
From 23rd to 24th November, the 9th forum media technology conference was held at St. Pölten University of Applied Sciences. Alexander Rind, David Pfahler, Christina Niederer and Wolfgang Aigner received the Best Paper Award for their paper “Exploring Media Transparency with Multiple Views”.
The paper was written in cooperation with David Pfahler from TU Wien and the VALiD project team from St. Pölten University of applied sciences. The basis for this research work was the Media Transparency Dashboard, an interactive visualization tool for open government data on media transparency in Austria.
© St. Pölten University of Applied Sciences
© St. Pölten University of Applied Sciences
28. October 2016, by Alexander Rind
On October 13, at the gala night of science in Grafenegg, Wolfgang Aigner received a scientific recognition award from the governor of Lower Austria, Erwin Pröll. The award recognizes Wolfgang’s scientific contributions to the field of Visual Analytics as compiled into his habilitation thesis “Interactive Visualization and Data Analysis: Visual Analytics With a Focus on Time”.
We congratulate our project lead on this achievement!
Recognized scientists Hubert Hettegger, Irina Sulaeva, Wolfgang Aigner, Edith Kapeller, Angela Sessitsch, Stéphane Compant, and Birgit Mitter with Barbara Schwarz, Erwin Pröll, and Petra Bohuslav. © Zsolt Marton
Wolfgang Aigner receiving the award. © Zsolt Marton
23. September 2016, by Štefan Emrich, Sonja Fischbauer
The “sommerloch” is over. It’s the end of September, Universities are preparing for the start of the year, politicians are back from their vacation and the sports leagues are starting their seasons…
So is the VALID-team.
Photo by Klaus Graf: Sommerloch Ortsschild (own work, edited by VALiD Project) [License: CC BY-SA 2.0 DE], via Wikimedia Commons
After a bit of relaxation and creative timeout we are eager to start working on the new ideas that were born during the summer time.
One of our next focal points will be to dig into the rich pool of speeches held in the Austrian parliament and to generate new insights out of these.
Also the academic conference season is kicking off now and will be covered by our team – with publications as well as in organizational roles.
We are looking forward to keep you updated on these things here at our blog – thus, come back!
19. May 2016, by Alexander Rind
Today launched another interactive tool on the VALiD website: The Media Transparency Dashboard offers a visual analytics solution to explore open government data on media transparency in Austria. This dataset covers money flows from government organizations to media companies for advertisement (§2) and sponsoring (§4) and currently has more than 34,000 (non-null) entries over 3.5 years The dashboard is comprised of multiple coordinated views that can be used together to discover insights.
David Pfahler from TU Wien has developed the dashboard together with the ValiD project team. The source code is available from GitHub as free and open source software.
More visual analytics experiments with the Media Transparency dataset are planned for the future. You can find the Media Transparency Dashboard at http://medientransparenz.validproject.at/dashboard/.
Media Transparency Dashboard filtered to government organizations containing “Energie”
4. May 2016, by Alexander Rind
Poster about VALiD (PDF in German)
VALiD and related projects in the area of visualization for data-driven journalism were presented at the Long Night of Research. This biennial happening is Austria’s largest science communication event targeting a wide general public at overall 2184 stations. More than 500 visitors came to FH St. Pölten, where the VALiD station was located.
Our team explained the benefits and research challenges of visualization in data-driven journalism and demonstrated visual analytics approaches for media transparency data, networks of members of the Austrian parliament, and a municipal budget as well as the evolution of popular baby names.
Team at Long Night of Science | photo: FH St. Pölten/Jakob Gramm
21. April 2016, by Robert Gutounig
The interactive tool to explore the research literature on data journalism was presented today by Julian Ausserhofer at the Nordic Data Journalism Conference in Helsinki. It helps to navigate through a carefully selected set of research papers and to find out more about the development of the research concerning this rather young field of study.
Different categories like development over time, type of publications etc. can be selected and visualized. In addition results from an accompanying qualitative content analysis have been included in the database and enable for example filtering the geographical scope of the analyzed studies. By extracting and visualizing the references mentioned in the papers the most influential works can be identified.
The building of the tool is a part of the VALiD project´s aim to investigate current workflows in data journalism. Michael Oppermann from the University of Vienna has developed the tool together with the ValiD project team. We are thankful to receive feedback concerning our research and the application for further development.
You can find the DDJ Literature Explorer here: http://literature.validproject.at
23. March 2016, by Sonja Fischbauer
Two weeks ago, we got together for our semi-annual consortial meeting at the wonderful drahtwarenhandlung in Vienna. Even though we were quite a crowd, not everyone could make it. Naturally, of course; to a research project of this scale, many people contribute. With busy schedules in different cities, countries and sometimes even timezones, it’s hard to get every team member into one room together.
As you can see in the image below, we barely manage to squeeze into one picture, and there’s always someone missing… But we’ve managed to find a solution to get us all in one place: here!
We’ve set up a page in the About Us section, where you can find out about every team member, their role in the project and their contact details.
Check it out and meet our team!
A good part of the VALiD team at the last consortial meeting at the drahtwarenhandlung, Vienna.
In the back row, left to right: Julian Ausserhofer, Wolfgang Aigner, Sarah Matiasek, Michael Seldmair, Martin Bicher. Middle row, left to right: Alexander Rind, Robert Gutounig, Sonja Fischbauer, Štefan Emrich. Front row: Christina Niederer. Here: Everyone who’s missing. (photo: VALiD project, cc-by)