Elsevier

Medical Hypotheses

Volume 82, Issue 4, April 2014, Pages 405-411
Medical Hypotheses

Psycho-Informatics: Big Data shaping modern psychometrics

https://doi.org/10.1016/j.mehy.2013.11.030Get rights and content

Abstract

For the first time in history, it is possible to study human behavior on great scale and in fine detail simultaneously. Online services and ubiquitous computational devices, such as smartphones and modern cars, record our everyday activity. The resulting Big Data offers unprecedented opportunities for tracking and analyzing behavior. This paper hypothesizes the applicability and impact of Big Data technologies in the context of psychometrics both for research and clinical applications. It first outlines the state of the art, including the severe shortcomings with respect to quality and quantity of the resulting data. It then presents a technological vision, comprised of (i) numerous data sources such as mobile devices and sensors, (ii) a central data store, and (iii) an analytical platform, employing techniques from data mining and machine learning. To further illustrate the dramatic benefits of the proposed methodologies, the paper then outlines two current projects, logging and analyzing smartphone usage. One such study attempts to thereby quantify severity of major depression dynamically; the other investigates (mobile) Internet Addiction. Finally, the paper addresses some of the ethical issues inherent to Big Data technologies. In summary, the proposed approach is about to induce the single biggest methodological shift since the beginning of psychology or psychiatry. The resulting range of applications will dramatically shape the daily routines of researches and medical practitioners alike. Indeed, transferring techniques from computer science to psychiatry and psychology is about to establish Psycho-Informatics, an entire research direction of its own.

Introduction

Reliable measurements of emotion, cognition and behavior are of equal and central importance to psychiatry and psychology. Despite the crucial role of these parameters, the basic methodology has remained essentially unchanged for the better half of a century. To this day, researchers rely on clinicians’ observations and self-rated psychometric tests. Requiring specially trained experts for conducting interviews or observations, these methods found widespread use in research, but failed to penetrate clinical practice. And, despite the highly skilled evaluators (and associated cost), the gathered data remains rather unsatisfactory, both in terms of quantity and quality. As outlined below, these shortcomings are inherent to the methodologies themselves, and cannot be overcome by enlarging financial budgets. At the same time, there appears to be no immediate successor. On the one side, novel and scientifically interesting methods like neuroimaging and genetics represent promising approaches to evaluate the course of psychiatric treatment [1]. On the other side, these approaches are still in an early stage of development, costly and often do not provide reliable diagnostics for individual patients.

The temporal granularity at which traditional methods collect data commonly is too coarse to reveal fine-granular patterns. The most important drawbacks are of entirely practical nature: around-the-clock shadowing is neither affordable nor acceptable to a participant. Instead, researchers employ questionnaires at fixed intervals, essentially relying on the participants’ self-report. This method naturally imposes a bound on the temporal granularity at which participants can be interviewed: weekly, monthly, or at an even coarser level. Furthermore, it is not possible to execute the identical psychometric test multiple times over the course of a single day. Memory and training effects would limit the reliability of the ratings. Next, holding interviews at high frequency would be prohibitively expensive, because such interviews have to be conducted by a trained professional. Depending on the level of training and employment status, a single interview quickly costs several hundred Euros. In addition, the necessary appointments impose too great of a burden on the participant, especially when the content of the interviews only relies to negative aspects of life such as psychopathological disorders. Including travel, a single interview can consume the better half of a day, a burden that only be imposed infrequently (in particular with participants pursuing a professional career). Self-reports in form of diaries do not provide a viable solution either. This method, too, quickly meets a limit of how much time commitment can be expected from a participant. In sum, the constraints (i) reduce the temporal granularity at which data can be gathered, and (ii) pose tremendous problems for longitudinal studies with respect to the amount and completeness of data gathered over an extended time range.

Unfortunately, data collected by the traditional means is also strongly biased. Most notably, it is commonly faulty and distorted, due to poor recollection of the variable of interest. This holds especially for coarse intervals of reporting, and especially for questions regarding interaction with digital devices. Very few people could accurately report how often they have checked their email over the past 10 days (which would be an interest variable to study Internet use/addiction). Additionally, reports about variables from other research areas, such as subjective well-being, tend to simply reflect altered psychological states. In particular, it has been shown that people use their momentary affective state for judging how happy and satisfied they are with their lives in general. A depressed patient for example will usually see his/her well-being, social functioning, and living conditions worse than they would appear to an independent observer, or to himself/herself after recovery [2]. Thus, self-reports are affected by the state of mind at time of reporting, and the social desirability of the reported behavior. Together, these factors introduce significant noise to infrequently recorded data. Clinician-rated psychometric tests, entail the risk of being similarly biased, since assessments of experts are not entirely objective. In this context, the term “objective” assessment is misleading and should be replaced by “external”, as this evaluation might reflect the subjective view of the assessor himself [2].

In short, data gathered by traditional means thus capture the situation of a study’s participant or patient rather poorly. It is too coarse to show temporal patterns, and generally lacks dynamics. Additionally, it commonly employs shallow scales, thus quickly encounter floor effects. Most questionnaires regarding depression, for example, only permit answers on each item on a scale of 0–3 [3], [4], [5]. The effects of these coarse measurements are dramatic, because novel psychotropic substances frequently become stuck during the development phase, because (visible) positive effects cannot be quantified reliably [6], [7]. Clearly, innovation in methodology has long been overdue.

As early as the seventies, researchers circulated the idea of actigraphy as a simple and non-invasive method for monitoring human rest and activity cycles. Inter alia, they measured sleep patterns [8], [9] and circadian rhythms [10] via specific actimetry sensors, worn on the body of the patient. While this approach overcame some of the obstacles faced by questionnaires, it did not quite hit the mark. Early technology was rather simple, rendering sensors complex, expensive and socially awkward, thus requiring substantial compliance and discipline from the patient. In some areas of research such as neuropharmacology, actigraphy only was administered in very few cases [11]. In recent years, miniaturization of digital devices has given new rise to the methodology. Sensors have become smaller, less power-hungry, and can independently transmit their data. While finally practical, the central obstacle to actigraphy remains: the patient/participant must be coaxed into carrying a sensor for a substantial period of time. In sum, actigraphy has only been used sporadically in most areas of psychiatry and psychology. Due to miniaturization, it is about to enjoy a second lease of life, but will ultimately be made redundant by sensor-less methods of tracking.

In this paper, we propose observing behavior directly on digital devices and services, such as laptops, social networks, or even cars. Specifically, we focus on user interaction with smartphones. Carried on the person, around the clock, and used for a wide range of (informal) communication, these devices constitute a particularly rich and intimate source of information. The gathered data is of highest quality, gathered entirely in the background, and automatically forwarded to a central server. The method thus burdens neither patient/participant nor researcher. Most importantly, avoids the dominant sources of bias, commonly encountered by self-reports and questionnaires.

For several areas of research, the proposed methodologies constitute the only viable solution. Most notably, it constitutes the only valid measure for usage and abuse of digital media. Kimberly Young [12] first saw a problem for the human condition when excessively using the Internet, an issue also put forward for the usage of mobile phones [13]. Whether the observed phenomena constitute a ‘new disorder’ is a matter of heated debate [14]. Although excessive use of the Internet is not a distinct disorder in the DSM-V, evidence from both psychology, psychiatry and the neurosciences suggest that “Internet addiction” constitutes a substantial challenge [15], [16]. While a high daily “dosage” does not qualify for an addiction, a rising number of hours spent with the phone over a certain time could indicate developing tolerance. In any case, such behavior must be recorded directly on the device. Ordinary patients/participants cannot be expected to accurately answer how often they unlock their phone each day (up to 200 times, according to our preliminary experimental findings). The particularly poor recollection in this context arises due to the “virtual” character of phone behavior. Alcohol consumption for example, is significantly easier to quantify, if only by the number of empty bottles.

The proposed methodology is about to equally revolutionize the work of researchers with more classic research agendas, such as personality or behavior. Recently, Kosinski et al. impressively inferred personality traits from the behavior on the Internet platform Facebook [17]. Yet, such research endeavors only mark the beginning of tight collaboration between psychology/psychiatry and informatics. After all, Facebook usage ‘only’ represents a rather narrow glimpse on people’s lives. By comparison, how much can we learn about the human condition when monitoring mobile phones 24 h/7 days a week? The socially outgoing (extraverted) person could easily be detected by the amount of in- and out-coming calls, indicating a large active social network. The introverted person in contrast might display longer reading sessions, perhaps using an e-book application. The person being open for new experiences (another of the Big Five Factors of Personality describing human characteristics by McCrae and John [18]) might often install and test new apps. Numerous such dependent variables can be detected by observing humans through their mobile phone interactions. These measures will capture the human condition more precise than ever. For the first time, psychiatrists and psychologists can observe human behavior on a large scale, in the finest temporal granularity. They can thus assess the course of treatment and disease in a temporal continuum, instead of relying on selective snapshots.

Equally, the proposed methodology is about to revolutionize clinical therapy, a role in which it will affect our everyday lives to an even higher degree. In this scenario, patients track a wide range of personal data, from phones, cars, and fridges. From this raw (and rather cryptic) data, large-scale analysis extracts meaningful indices, such as an “activity index”, or a “social interaction index”. The patient can then self-track his condition. He is reassured that it is not worsening. Or, if a worsening of his health condition occurs, he could confidently ask for an ad-hoc appointment with his doctor. In addition, he can explore interdependencies between his health condition and his lifestyle, such as staying up late, or working out. Most importantly, he can provide (selected) data access to his coach, therapist or doctor.

For the clinician, this methodology enables an entire range of new options. For the first time, he does not have to rely on the (poor) self-report of his patient. Instead, he receives clear indicators of the patient’s mental state, and changes therein, in a fine-granular temporal resolution. He can thus observe the continuous changes of health parameters over time (to follow the course of a disease, or the progress of therapy). The clinician will also be able to investigate changes of his patient throughout the day, and fine-tune timing and dosage of medication, providing a highly individualized therapy. For example, he thus could match medication doses in a patient suffering from schizophrenia. The clinician can even prescribe a range of dosage, from which the patient can independently choose, according to his or her latest data. The therapist can be automatically alarmed when symptom data indicates a critical situation. In this case, he can intervene via phone, video conference, or an ad-hoc appointment. At the same time, regular appointments can be spaced further apart.

Most importantly, the proposed methodology is significantly cheaper than personal interaction with a therapist. This profane observation has vast implications, opening the application area towards wellness and prevention for large amounts of people. Currently, society focuses its limited therapeutic resources on sick patients. In the future, data driven early warning systems will enable us to help people a long time before their conditions becomes serious or chronic. Raising red flags early, some people might just need to attend a seminar on sustainable usage of digital media, or an extended vacation, or the HR department talk to their chaotic manager. Eventually, most corporations will deploy data driven preventive mental health programs. The ethical perspective (as discussed below) only constitutes a fraction of the challenges these services face. The integration into the processes and structures of large corporations might turn out far more difficult. Yet, occupational doctors can serve as a blueprint for a data driven occupational mental health service, leading to their widespread deployment much sooner than anticipated.

The remainder of this paper is structured as follows. Next, we outline the underlying technological vision, comprised of various data sources, and means to store and analyze the data. Subsequently, we introduce two current studies and respective hypotheses. One study tracks depression, the other investigates the misuse of mobile phones. We then touch upon the ethical aspects of the proposed methodology, a topic we feel very strongly about. The article ends with an outlook on the anticipated changes in research and therapy. As we outline, the proposed methodology will shape, if not revolutionize, psychiatry and psychology. The envisioned shift will be massive, touch every aspect of both sciences, and eventually create its own field of research: Psycho-Informatics.

Section snippets

Underlying technological vision

This paper’s is based on a single central thesis. The user’s mental state, we claim, affects the way he interacts with a machine. A stressed user may thus generate more typographic errors than ordinarily; a depressed user may communicate less over his phone than previously. Conversely, so the claim continues, changes in his interaction with a machine reflect changes in his mental state. Modern computer science enables us to automatically gather the appropriate data, transfer, and analyze it,

Current research hypotheses in psychiatry and psychology

In two current studies, we monitor smartphones to track (i) the severity and course of depression as well as (ii) conspicuous usage of the Internet and phone. While these studies are decidedly small-scale, at least compared to the above technological vision, they are primarily intended to evaluate the validity as well as practicability of the proposed methodology.

Ethical aspects and data privacy issues of ‘Big Data’ research

The use of Big Data in research and therapy necessarily raises ethical concerns. Bordering mass surveillance, it realizes the vision of a “Gläserner Mensch”, a transparent human. Data privacy thus takes on a central role, and the potential of abuse cannot be overestimated. While monitoring depression in a medical scenario fulfills the highest ethical standards, it could equally well be misused by an employer to secretly monitor his staff, or by an insurance company to reject at-risk applicants.

Conclusions and vision for the future

This paper introduces Psycho-Informatics, the application of Big Data to psychology and psychiatry. Highly sensitive, the suggested method collects, stores, and analyzes massive amounts of indicative data at little cost and without risks or stress for patients or participants. The paper outlines the technical vision, sketches the signals that can be detected, and illustrates the tremendous benefits over traditional methods of psychometrics. In particular, it suggests tracking user behavior with

Conflicts of interest statement

None of the authors’ reports a conflict related to the work described. The software mentioned is currently developed for research purposes only, no commercial exploitation of it is planned at this stage.

Acknowledgement

This work was partially funded in part by a grant awarded to C.M. by the DFG (MO-2363/2-1) and an independent investigator grant for the assessment of effects of deep brain stimulation for treatment resistant depression by Medtronic Inc. to TS.

References (20)

There are more references available in the full text version of this article.

Cited by (116)

View all citing articles on Scopus
View full text