Prof. Dr. Udo Kruschwitz has held the Chair of Information Science at the University of Regensburg since July 2019. Before that he was professor at the School of Computer Science and Electronic Engineering at the University of Essex.
His research deals with the connection of Information Retrieval (IR) and Natural Language Processing (NLP). He has led research projects developing algorithms to transform unstructured and partially structured textual data into structured knowledge and user/cohort models that have been applied in a variety of fields including search, navigation and summary.
I was appointed Chair of Information Science in July 2019. We are part of the Institute of Information and Media, Language and Culture (I:IMSK).
I am very fortunate to be teaching subjects that nicely match my research interests, primarily in the areas of Natural Language Processing and Information Retrieval.
Natural language processing and information retrieval are key areas of AI. As soon as you dive into one of these areas you are dealing with AI. Therefore it comes as no surprise that lectures and classes introduce the concept of machine learning early on. The great thing is that our students do not simply learn about these concepts but are in a position to apply them in their own projects and even get their results published at top conferences in the field.
This is not difficult to convey as we read about the latest developments in AI in the media on a daily basis, ranging from digital assistants to voice-controlled communication in cars over to books that have been written entirely by an AI (and that is just a sample of what is out there). In terms of prior knowledge, there have also been major changes in recent years. The threshold to enter the brave new world of deep learning has been lowered substantially so that you do not need a background in computer science to join in. Last term, for example, I supervised a Master student in criminology and without a computer science background who nevertheless managed to develop her own pipeline for text classification based on the state of the art in language representation and processing.
Information Science is a bit of an unusual subject. On the one hand there is a clear link to computer science, but there is also a strong arts and humanities tradition (just look at the faculty we belong to). Therefore you will find that each university puts its own spin on the subject. Here in Regensburg we make sure that our graduates have a solid background in natural language processing (and hence in core concepts of AI) which helps with employability. They will be able to find jobs that have in the past been mainly open to graduates with a more technical background, such as in the broader field of Data Science. As a student you might however decide that you would like to focus more on the humanities side of the subject, and that is also fine. What all our graduates have in common is that they are able to independently tackle information science problems, develop suitable solutions and then evaluate them using rigid evaluation methodologies.
You can’t get away from AI nowadays, it’s everywhere. Some students want to find out how to engineer AI solutions, others want to get a deep understanding of core AI concepts and yet others are keen on getting a rewarding job within the broader scope of AI after graduating from university. All of these are good reasons to focus on AI topics as part of the curriculum.
There is another reason though to take AI seriously and that is the paradigm shift that we have witnessed in the last few years. Neural approaches have largely replaced traditional statistical as well as rule-based ideas when you look at state-of-the-art performance of machine learning algorithms in a broad range of applications. There are three key reasons that have made this development possible, namely massively increased computing power (e.g. via GPUs), the availability of training data magnitudes larger in size than before and finally the emergence of scalable software tools accessible to anyone interested in implementing modern machine learning algorithms. This development can nicely be illustrated by looking at recent advances in natural language processing (NLP). The paradigm of choice for a multitude of NLP problems is an architecture called BERT. Note that the paper that this architecture is based on was only published in 2019 at one of the top NLP conferences. As of today it has already been cited 17,623 times according to Google Scholar. In other words, the state of the art in NLP has been shifted dramatically within less than three years, and there is no end in sight to this trend.
There is another reason though to take AI seriously and that is the paradigm shift that we have witnessed in the last few years. Neural approaches have largely replaced traditional statistical as well as rule-based ideas when you look at state-of-the-art performance of machine learning algorithms in a broad range of applications. There are three key reasons that have made this development possible, namely massively increased computing power (e.g. via GPUs), the availability of training data magnitudes larger in size than before and finally the emergence of scalable software tools accessible to anyone interested in implementing modern machine learning algorithms. This development can nicely be illustrated by looking at recent advances in natural language processing (NLP). The paradigm of choice for a multitude of NLP problems is an architecture called BERT. Note that the paper that this architecture is based on was only published in 2019 at one of the top NLP conferences. As of today it has already been cited 17,623 times according to Google Scholar. In other words, the state of the art in NLP has been shifted dramatically within less than three years, and there is no end in sight to this trend.
The nice thing about research-active universities such as the University of Regensburg is the concept of research-led teaching, i.e. research interests and outputs directly feed into my teaching. Two research areas I am currently particularly interested in are hate speech and fake news detection. The research-led teaching approach I was talking about means that I can offer project ideas in this area to my students and I am amazed by what they manage to produce, already this year several of our students published the results of individual projects at high-impact conferences. Yet another great reason for me to look forward to the new term.
Broadly speaking, all of my research falls within the scope of AI. Take Natural language processing, it touches on many different aspects of AI and the projects I have been involved in reflect this. After graduating, for example, I was part of the BMBF-funded multi-site Verbmobil project, aimed at creating a dialogue assistant that would be able to automatically translate spoken language as part of a conversation between businesspeople, this all goes back a quarter of a century… Another, very different type of dialogue system was the Yellow Pages Assistant that we developed in a project with British Telecom. I also have a long-standing collaboration with the London-based tech company Signal AI, here the focus is on identifying relevant news stories and insights within a stream of millions of articles, press releases and other stories.
My first project in the area of natural language processing was to build a parser that would identify the syntactic structure of simple English sentences which must have been around 1986. This was a collaboration between my school and an academic institution. I have been fascinated by computational linguistics ever since. In the early 90s I spent a year at the University of Edinburgh and took a fair number of AI modules (you get a sense for how long ago that was when you consider that the AI building on South Bridge has burnt down years ago, the department has moved twice since then and has eventually found its new home in the Informatics Forum — which also goes back more than ten years now). Nowadays I am getting paid to work with AI, what could be better?
Question 10: Do you personally find it easy or difficult to deal with topics that are related to AI?
As a computer scientist it is perhaps a bit easier in general. But what we see today is a rapidly changing field. You need to stay on top and that is not easy as what is the state of the art today might be out of date in a year’s time. Just as an illustration, you see researchers working in tech companies such as Google, Facebook, Microsoft publishing their latest research findings in the preprint repository arXiv.org – because by the time the next top conference comes along the results they report might already have been beaten by newer approaches.
What is particularly interesting is the rapid development. Things that were judged too hard to be done by automated systems are reality now. An example is speaker-independent speech recognition, systems that do not need to be trained by each individual speaker via reading out a long list of boring sentences. Another example is progress in search engine technology. What used to be ‘10 blue links’ that were returned for a user query is now often a short concise answer, or the search engine might even tell me the status of my flight if the query was interpreted as a flight number. By the way, all that also relies on BERT which was developed at Google but was subsequently made available to the community. It’s a fascinating area!
The most interesting project in this respect is the proposal for a graduate teaching centre we are working on, highly interdisciplinary but still work in progress. Watch this space!
If you look at AI as a research area you will find that many of the concepts have been around for decades and some of the key ideas where first proposed in the 1940s, i.e. AI has had its place in the curriculum in higher education for a very long time. What has changed though is that it now goes beyond computer science. One reason is that the implications of AI are now wide-ranging. Automatically identifying hate speech, for example, is not simply a technical problem as there is a fine line between removing documents and restricting free speech. Therefore, I think that certain aspects of AI will be of interest to any student no matter what subject they are studying (although when it comes to theology I am not so sure as it goes beyond my area of expertise). You can however interpret the question differently and ask how AI can be utilised in higher education. This is already happening. Georgia Tech, for example, deploys a ‘teaching assistant’ called Jill Watson. Jill Watson is an AI based on IBM’s Watson technology that assists in running their online courses and has been doing so for years. In short, AI will in future become even more centre-stage in higher education in many ways.
We play a key role in this project. For example, we started the Data Science @ Regensburg Meetup in the autumn of 2019 with the aim of bringing together different communities and stakeholders for monthly meetings. That involves computer scientists as well as psychologists, academics as well as people from industry, students as well as professors. What a great mix you might think, and I would agree. The talks are given by high-profile speakers, some of them from industry others with a university background, sometimes a bit of both. We are also active in ‘Artificial Intelligence in Regensburg (AIR)’, an initiative started by local government. The Women in Data Science (WiDS) 2021 Regensburg conference was an event that emerged from this involvement.
Information Science is more important than ever today. When you discuss the topic of ‘artificial intelligence’, then you quickly have someone raising the question of whether you can (or should or ought to) trust an algorithm — and this is indeed a very important question. However, step back a bit, and you will notice that there are plenty of examples where people do not even trust ‘natural intelligence’ as represented by subject experts or institutions considered to be the key authorities in a particular field. As information scientists we therefore have the duty to help people understand how to process and interpret information systematically and objectively. Let me use the current pandemic as an example. We can call ourselves lucky to live in a country where anyone can freely express their ideas and beliefs, no matter how crazy and mad these might be. This freedom of expression is a key pillar of our democracy. It does however become a bit of a problem if the informed opinion of an expert, such as a virologist or an epidemiologist, is considered to be equally valid as the opinion of Joe Bloggs claiming, for example, the virus does not exist at all, is no different to your usual flu, is part of a conspiracy involving Bill Gates and some blood-thirsty politicians etc…. That is exactly where we as information scientists need to intervene and assist, guide and clarify. There is still a long road ahead.
Mr. Kruschwitz, thank you very much for taking the time to answer these questions. We wish you a nice day!
Mr. Kruschwitz, thank you very much for taking the time to answer these questions. We wish you a nice day!