LETTERS
Z'd]oNF Identifying natural images from human brain activity
C k
/DV *]DO3Zw' Kendrick N. Kay, Thomas Naselaris, Ryan J. Prenger & Jack L. Gallant
'a~F'FN$ 5
2r\Q}v$ A challenging goal in neuroscience is to be able to read out, or decode, mental content from brain activity. Recent functional magnetic resonance imaging (fMRI) studies have decoded orientation1,2, position3 and object category4,5 from activity in visual cortex. However, these studies typically used relatively simple stimuli (for example, gratings) or images drawn from fixed categories (for
n+s=u$%qn example, faces, houses), and decoding was based on previous measurements of brain activity evoked by those same stimuli or categories.
hzh3p[ 4UN|`'c To overcome these limitations, here we develop a decoding method based on quantitative receptive-field models that characterize the relationship between visual stimuli and fMRI activity in early visual areas.
Q{~;4+ZD r_o\72 These models describe the tuning of individual voxels for space, orientation and spatial frequency,
&0+Ba[Z ^ and are estimated directly from responses evoked by natural images.
B.);Ju J~N!. i We show that these receptive-field models make it possible to identify, from a large set of completely novel natural images, which specific image was seen by an observer. Identification is not a mere consequence of the retinotopic organization of visual areas; simpler receptive-field models that describe only spatial tuning yield much poorer identification performance. Our results suggest that it may soon be possible to reconstruct a picture of a person’s visual experience from measurements of brain activity alone.
ifmX<'(9A +Edq4QYwR Imagine a general brain-reading device that could reconstruct a picture of a person’s visual experience at any moment in time. This general visual decoder would have great scientific and practical use.
1b@]^Ue For example, we could use the decoder to investigate differences in perception across people, to study covert mental processes such as attention, and perhaps even to access the visual content of purely mental phenomena such as dreams and imagery. The decoder would also serve as a useful benchmark of our understanding of how the brain represents sensory information.
H8E#r*"-m xg;F};}5$
How do we build a general visual decoder? We consider as a first step the problem of image identification3,7,8. This problem is analogous to the classic ‘pick a card, any card’ magic trick. We begin with a large, arbitrary set of images. The observer picks an image from the set and views it while brain activity is measured. Is it possible to use the measured brain activity to identify which specific image was seen?
dX-j3lM:# m5
W':vM To ensure that a solution to the image identification problem will be applicable to general visual decoding, we introduce two challenging requirements. First, it must be possible to identify novel images. Conventional classification-based decoding methods can be used to identify images if brain activity evoked by those images has been measured previously, but they cannot be used to identify novel images (see Supplementary Discussion). Second, it must be possible to identify natural images. Natural images have complex statistical structure and are much more difficult to parameterize than simple artificial stimuli such as gratings or pre-segmented objects. Because
8boiJku`
neural processing of visual stimuli is nonlinear, a decoder that can identify simple stimuli may fail when confronted with complex natural images.
G420o}q :aD_>,n Our experiment consisted of two stages (Fig. 1). In the first stage, model estimation, fMRI data were recorded from visual areas V1, V2 and V3 while each subject viewed 1,750 natural images. We used
HLqDI lL these data to estimate a quantitative receptive-field model10 for each voxel (Fig. 2). The model was based on a Gabor wavelet pyramid11–13 and described tuning along the dimensions of space3,14–19, orientation 1,2,20 and spatial frequency21,22. (See Supplementary Discussion for a comparison of our receptive-field analysis with those of previous studies.)
dSS Ai
|} p1IN%*IV+o In the second stage, image identification, fMRI data were recorded while each subject viewed 120 novel natural images. This yielded 120 distinct voxel activity patterns for each subject. For each voxel activity pattern we attempted to identify which image had been seen. To do this, the receptive-field models estimated in the first stage of the experiment were used to predict the voxel activity pattern that would be evoked by each of the 120 images. The image whose predicted voxel activity pattern was most correlated (Pearson’s r) with the measured voxel activity pattern was selected.
+77j2W_0 W^
eQ}A+Z Identification performance for one subject is illustrated in Fig. 3. For this subject, 92% (110/120) of the images were identified correctly (subject S1), whereas chance performance is just 0.8% (1/120). For a second subject, 72% (86/120) of the images were identified correctly (subject S2). These high performance levels demonstrate the validity of our decoding approach, and indicate that
C
*7x7|z our receptive-field models accurately characterize the selectivity of individual voxels to natural images.
#6#%y~N A general visual decoder would be especially useful if it could operate on brain activity evoked by a single perceptual event. However, because fMRI data are noisy, the results reported above
I1p{(fJ were obtained using voxel activity patterns averaged across 13 repeated trials. We therefore attempted identification using voxel activity patterns from single trials. Single-trial performance was
Seq
^o= 51% (834/1620) and 32% (516/1620) for subjects S1 and S2, respectively (Fig. 4a); once again, chance performance is just 0.8% (13.5/1620). These results suggest that it may be feasible to decode
Jb)xzUhES the content of perceptual experiences in real time7,23.
[ 此贴被伍胥之在2009-11-29 16:02重新编辑 ]