CD Collectors Hub

Practical information and observations on audio CD collecting.

Why the Best CDs Are Often the Cheapest

How Demand and Availability Drive Used CD Prices

Statistical Analysis of Real-World Discogs & Rate Your Music Data - December 2025

Abstract

What determines the price of a used CD? Most collectors have a common sense set of assumptions that include album popularity, artist reputation and rarity. But these beliefs are rarely, if ever, tested using real data. The availability of several online data resources now makes it possible to determine which variables actually drive the price of used CDs.

In this study, publicly available data from two websites, Discogs and Rate Your Music (RYM), were analyzed to identify which variables correlate with used CD prices. Two artists with relatively long discographies, but with contrasting fan bases and music genres, were selected as test cases: Prince and R.E.M. Data including sold prices, various measurements of demand, availability, album reputation, number of releases, and age were obtained from these sources.

The analysis shows that CD prices are largely explained by two factors: demand relative to ownership (the want/have ratio from Discogs), and long-term availability (the number of releases of an album as shown in RYM). Together these two variables explain about 75% of the variation in price. Surprisingly, album reputation, as measured by user ratings, does not independently impact price. In fact, many highly regarded albums are less expensive because they were manufactured in larger quantities.

An additional nuance is that there is a difference between the immediate availability (copies currently for sale) versus long-term availability (represented by documented releases of an album). The latter appears to be a better indicator of the total number of copies that exist and therefore has a stronger relationship to price.

Introduction

In the used record store, or at any of the many online marketplaces, CDs usually do not sell for a fixed price. The price is set by the seller, often referring to Discogs or eBay completed sales, but also shaped by beliefs about desirability, scarcity, or reputation. Typically the price is only a few dollars, but some highly prized discs can sell for much more. CD collectors often apply reasonable logic when estimating value, but what does the data indicate is actually related to sold prices?

CD collectors are always looking for ways to better understand their collection and the broader market. Often relying on informal rules of thumb, such as the idea that better albums or more popular artists are worth more, which are believed but not tested. Going into this work I shared many of those thoughts, feeling that supply and demand was critical, but also that quality rating of a given album, or which artist made a given album might also play an important role.

There is no shortage of online commentary on CD desirability and pricing, but little of that is based on hard numbers. In many fields, relationships like these are examined by analyzing data rather than relying on intuition. Applying that approach to CDs will help clarify what does and does not affect CD prices.

The main question of this analysis is whether CD prices are driven primarily by supply and demand or whether factors such as album reputation, artist identity, or age of the release play additional independent roles. With the data collected here it is possible to test these ideas directly rather than relying on intuition.

The results reveal several interesting patterns. First, CD prices are strongly associated with demand and the long-term availability of an album. Second, the analysis suggests an important distinction between short-term availability, meaning how many copies are currently for sale, versus long-term availability, meaning how many copies were likely produced over the lifetime of the release. Finally, and perhaps most surprisingly, album quality, as measured by user ratings, does not independently increase price once supply is taken into account.

Data and selection criteria

Deciding what data to use in this type of analysis is an important step. I have not downloaded a giant dataset. The information was manually collected, so it was necessary to limit the scope to something manageable. Real-world data can be “messy” and may include points that are not representative (for example deluxe releases shouldn’t be compared to standard releases). So these points were removed when identified.

To evaluate whether the artist, treated as an input variable, is itself a significant factor, two popular and influential artists from the same era were selected, Prince and R.E.M. They have different musical styles and different patterns of ongoing archival releases (the Prince estate continues to release new material). Another difference is R.E.M. is a band, while Prince, despite working extensively with his own groups and collaborators throughout his career, is primarily perceived as a solo artist.

An important consideration is that both artists have relatively large discographies which provide more data points (a total of 50 albums across the two artists) for analysis and leads to a more robust result. Selecting contrasting examples also increases the likelihood that the analysis will detect meaningful differences should they exist. Obviously there are other factors relating to artist that this selection will not test, for example a lesser known versus well known artist, or more extreme genre differences, like classical music versus rock, but that is beyond the scope of this study.

RYM was used to generate a standard discography, using their list of studio albums. The site allows for display of an artist’s complete discography by album, live album, singles, etc. For this project only album discography was used. In addition to artist and title RYM provides a rating based on the average of user ratings on a 1-5 scale (5 is best). This is used to tell us the reputation of an album, or put another way a consensus of how good a particular album is. RYM also provides year of original release, number of people who rated, number of people who wrote reviews for the site, and “issues” which is the number of distinct releases and formats of an album, including releases from different countries and different formats. While this does not directly represent the number of CDs produced, it appears to be a proxy for overall supply, or as I describe it long-term availability, in contrast to total copies for sale, which is short-term availability.

Once a list of albums was generated, Discogs was used to get price data. Prior to doing that, standardized guidelines had to be created to get the highest quality and most usable data possible.

For each title the data was filtered to show only CDs and only US pressings. For most CDs Discogs lists many more versions or pressings than the RYM issues count. This is due to minor differences including the factory that manufactured the CD, or whether the CD is a record club release. In this analysis all these differences are ignored and data for each album is consolidated. Because some albums had dozens of minor variations, sales information was recorded only for the top five examples (as determined by number for sale) of each album. For this analysis, short term availability was calculated by summing copies for sale across the five CD versions with the highest number of copies for sale for each album. Other variables recorded were median sales price, users reporting “have” and users reporting “want”. Also, importantly, the ratio of the number of want to have was analyzed as a separate variable. Deluxe, box sets, rare early pressings (targets, longboxes), recent remasters, for example from the Prince estate, were excluded. Remasters up to 2005 were included if not also in one of the other excluded categories.

To summarize, for Discogs the following data was used: total copies for sale, total number of have and want, as well as the ratio want/have, and the median of the sold prices.

The analysis does not account for condition, or other factors not specifically mentioned below. Median sold price is intended to average over these effects.

And from RYM the following data was used: album rating, count of users who rated, count of users who reviewed, count of issues (e.g. releases of the album).

The table below shows the variables, their source, as well as a brief description of what the variable is intended to be measuring which explains why it is included.

VariableSourceDescriptionIntended meaning
Median sold priceDiscogsMedian of recent sold prices for the top five CD versions (as determined by number for sale)Market price of the album
Copies for saleDiscogsCount of copies currently listed for sale across selected CD versionsShort term availability
Have countDiscogsCount of users reporting ownership of the albumOwnership prevalence
Want countDiscogsCount of users reporting a desire to own the albumUnmet demand
Want/Have ratioDiscogsRatio of want count to have countDemand imbalance
Album ratingRYMAverage user rating on a 1-5 scaleAlbum reputation or perceived quality
Count of ratingsRYMCount of user ratings submittedBreadth of audience engagement
Count of reviewsRYMCount of written user reviewsDepth of engagement
Count of issuesRYMCount of distinct releases and formats listedProxy for long-term availability
Album ageRYMYears since original album release

Price uses a median (the middle value) instead of the average since otherwise very high or low individual values could have undue influence on the results.

To calculate how the different factors relate to price, a statistical technique called multiple regression is used. The method looks at all the variables at the same time and figures out which ones best explain the sold price.

It is important to realize that in addition to determining which variables help explain differences in price, the analysis also shows which variables do not.

The regression model relates sold price to demand imbalance (want/have ratio) and long-term availability (number of issues), along with other variables representing album reputation, listener engagement, album age, and artist.

Results

The statistical analysis evaluates how each collected variable relates to sold price.

Of the factors examined, two clearly dominate in their relationship to sold price: the ratio of want/have indicated by Discogs, and the number of issues as indicated by RYM. In statistical terms these variables are highly significant, which means that the results would not occur by random variation or chance in the data. These variables are proxies for demand (perhaps better described as demand imbalance) and long-term availability. Other evaluated variables which are also proxies for demand or availability are related to sold price, but more weakly.

The higher the want/have ratio, the higher the sold price. While this was expected, the data clearly demonstrates the strength of the relationship.

The higher the number of issues reported by RYM, the lower the sold price. This likely occurs because more popular albums are released in higher numbers, resulting in higher supply that offsets any price premium popularity might otherwise generate.

A graphical representation of these relationships is shown below.

Figure
Figure

Multiple regression analysis shows that both of these variables are independently related to the sold price and remain statistically significant when considered together.

To illustrate what the graph shows in dollar terms, the table below gives approximate prices at three availability and demand levels rounded to the nearest dollar.

Figure

All the other variables mentioned show either no meaningful relationship to price or have a weaker relationship than the two main variables which suggests they act as proxies for the same underlying factors but with higher variability.

Discussion

The correlations identified here represent statistical relationships and should not be interpreted as proof of cause and effect.

The analysis reveals that CD prices generally follow the law of supply and demand. Here supply is best reflected by the long-term release history (the number of documented issues) while demand is captured by the want/have ratio. There is a key distinction to make between short-term availability which is shown by copies for sale, and long-term availability represented by number of issues. The latter likely better reflects the total number of copies in existence that can enter the market as demand increases. A graphical illustration of the relationship between RYM issues and copies for sale is shown in the appendix.

Other factors examined including album rating, artist, age of the release, and other measures of popularity, do not emerge as independent drivers of price once the demand and long-term availability is taken into account.

So why is album rating not emerging as a consistent factor? More popular albums were manufactured in higher quantities. The greater original supply appears sufficient to offset any price premium that would otherwise result for more highly rated albums.

Prices would be expected to deviate from the overall trend if an individual album’s desirability increased substantially over time, which is a less common occurrence and was not picked up in this study.

There are a small number of albums which deviate from the overall trend. These exceptions generally involve releases with additional mystique attached, or unusual release histories, such as Prince’s Black Album, which was in print for only a few weeks.

Conclusion

This analysis demonstrates that used CD prices are strongly associated with availability and demand. Albums with stronger demand and fewer documented releases tend to command higher selling prices.

In contrast, rating of an individual album is not shown to be a factor influencing selling price. The most likely explanation for this is that more highly rated albums were manufactured in larger quantities, depressing any price premium. The data shows that some individual releases deviate from the overall trend and it is suggested that this could be related to unusually low production numbers or other atypical factors not captured in the data.

The study was limited to two artists active primarily in the 1980s-1990s, and covered only studio album releases. Data was obtained in December, 2025.

Suggestions for future work

Future analysis could extend this approach to additional artists, other types of releases (for example live recordings), different eras, and a broader range of musical genres. The results could also change over time so periodically repeating the analysis might be warranted. In addition, a similar analysis could be made using other music formats, for example vinyl, for which the similar data is available. My informal observation of the vinyl market suggests results might differ, with popular albums selling for relatively high prices despite higher production numbers.

Appendix

Details of the statistical analysis

The analysis was done using JASP which is an open source statistical software package.

An Excel file was created containing the dataset described above, consisting of 50 CD titles.

The analysis was done on log transformed data for two reasons. First, most variables span a wide numerical range, and the log transformation prevents large values from dominating the results. Second, logarithmic scaling often makes curved relationships more linear, allowing the statistical methods used here to work better.

This analysis uses linear regression, and variables are included in the model based on how well they predict the response (in this case the median sold price of the CD). Following standard statistical practice, a variable is included in the model if its p-value is below 0.05. This means that if there were no real relationship then the probability of observing a relationship this strong due to random variation alone would be less than 5%. Such a result is described as statistically significant. In this study, the two variables retained in the model have p-values below 0.001, indicating the probability of observing a result of this magnitude due to random variation is extremely small.

Several methods of adding and subtracting variables from the model were attempted with the same final result each time.

The key output is shown below:

Figure

This shows the model is:

Log (Median sold price) = 1.924 + 0.689 Log (want/have ratio) -0.641 Log (RYM issues).

The adjusted model R squared is 0.754 suggesting that the model predicts about 75% of the variability in the sold price.

The low VIF values indicate that the two variables in the model are independent of each other.

A graph of residuals vs. predicted values shows no trends indicating problems with the model:

Figure

Data table

From Rate Your Music From Discogs Log transformations in final model
Artist RYM Issues RYM rating RYM Rating Count RYM Review Count Release Date Album Age Total have Total want Ratio want/have Copies for sale Median sold price Log RYM Issues Log Ratio want/have Log Median sold price
For You Prince 15 3.22 4237 88 1978 47 891 593 0.666 26 9 1.176 -0.177 0.954
Prince Prince 19 3.5 6240 80 1979 46 2127 963 0.453 60 6 1.279 -0.344 0.778
Dirty Mind Prince 18 3.81 10126 155 1980 45 1250 987 0.79 27 7 1.255 -0.103 0.845
Controversy Prince 17 3.65 6655 90 1981 44 2597 1381 0.532 132 3.83 1.23 -0.274 0.583
1999 Prince 36 3.92 14023 184 1982 43 2879 1768 0.614 108 5 1.556 -0.212 0.699
Purple Rain [motion picture soundtrack] Prince 48 4.14 30633 371 1984 41 3978 2153 0.541 129 4 1.681 -0.267 0.602
Around the World in a Day Prince 32 3.68 7278 138 1985 40 1770 539 0.305 53 5 1.505 -0.516 0.699
Parade [motion picture soundtrack] Prince 22 3.77 6373 121 1986 39 1534 588 0.383 67 5.99 1.342 -0.416 0.777
Sign "☮︎" the Times Prince 35 4.04 16118 237 1987 38 1775 1096 0.617 64 7.495 1.544 -0.209 0.875
The Black Album Prince 20 3.28 2309 43 1987 38 2054 734 0.357 58 18.345 1.301 -0.447 1.264
Lovesexy Prince 17 3.53 3968 82 1988 37 2496 543 0.218 90 4.55 1.23 -0.662 0.658
Batman [motion picture soundtrack] Prince 19 3.11 3087 70 1989 36 3148 731 0.232 146 4.97 1.279 -0.634 0.696
Graffiti Bridge [motion picture soundtrack] Prince 12 2.96 1921 52 1990 35 2571 381 0.148 207 4.35 1.079 -0.829 0.638
Diamonds and Pearls Prince 20 3.42 3256 51 1991 34 4105 551 0.134 377 3 1.301 -0.872 0.477
O(+> [a.k.a. Love Symbol Album] Prince 8 3.48 2751 61 1992 33 3304 412 0.125 249 3.76 0.903 -0.904 0.575
Gold Nizza Prince 1 2.56 255 3 1993 32 201 279 1.388 4 202.94 0 0.142 2.307
Come Prince 12 3.29 1864 46 1994 31 1605 299 0.186 109 4.65 1.079 -0.73 0.667
The Gold Experience Prince 10 3.73 3486 56 1995 30 2267 375 0.165 188 6.5 1 -0.781 0.813
Chaos and Disorder Prince 9 3.01 1289 45 1996 29 2760 345 0.125 187 5.25 0.954 -0.903 0.72
Emancipation Prince 7 3.25 1613 56 1996 29 3321 227 0.068 251 4.48 0.845 -1.165 0.651
Crystal Ball Prince 3 3.19 878 21 1998 27 842 215 0.255 18 7.31 0.477 -0.593 0.864
New Power Soul Prince 4 2.85 429 14 1998 27 1188 137 0.115 86 10.5 0.602 -0.938 1.021
Rave Un2 the Joy Fantastic Prince 6 2.71 971 20 1999 26 2420 220 0.091 206 2.825 0.778 -1.041 0.451
Rave In2 the Joy Fantastic [remixes] Prince 2 2.94 250 5 2001 24 589 310 0.526 20 136.76 0.301 -0.279 2.136
The Rainbow Children Prince 8 3.43 1280 41 2001 24 2906 357 0.123 161 9.725 0.903 -0.911 0.988
One Nite Alone... Solo Piano and Voice by Prince Prince 2 3.28 589 8 2002 23 597 375 0.628 7 87.5 0.301 -0.202 1.942
N·E·W·S Prince 2 3 743 21 2003 22 2081 588 0.283 45 37.765 0.301 -0.549 1.577
Musicology Prince 9 3.35 1861 59 2004 21 3485 294 0.084 323 2.3 0.954 -1.074 0.362
3121 Prince 8 3.34 1737 52 2006 19 1553 140 0.09 115 2.26 0.903 -1.045 0.354
Planet Earth Prince 7 2.93 1153 32 2007 18 1322 106 0.08 117 3.12 0.845 -1.096 0.494
Lotusflow3r / MPLSound / Elixer Prince 1 3.07 493 12 2009 16 3429 207 0.06 125 5 0 -1.219 0.699
Art Official Age Prince 6 3.13 1158 19 2014 11 933 121 0.13 20 5.25 0.778 -0.887 0.72
Plectrumelectrum Prince 5 2.88 748 8 2014 11 788 122 0.155 12 7.25 0.699 -0.81 0.86
HITnRUN Phase One Prince 5 2.75 717 10 2015 10 1090 135 0.124 36 5.96 0.699 -0.907 0.775
HITNRUN Phase Two Prince 4 3.42 821 17 2015 10 1523 237 0.156 77 6 0.602 -0.808 0.778
Murmur R.E.M. 23 3.88 20751 359 1983 42 2777 834 0.3 63 3 1.362 -0.522 0.477
Reckoning R.E.M. 21 3.82 11341 175 1984 41 1828 703 0.385 45 3.85 1.322 -0.415 0.585
Fables of the Reconstruction R.E.M. 24 3.66 7812 146 1985 40 2727 533 0.195 54 3.49 1.38 -0.709 0.543
Lifes Rich Pageant R.E.M. 21 3.82 9593 157 1986 39 2507 542 0.216 63 3.92 1.322 -0.665 0.593
Document R.E.M. 26 3.72 10914 165 1987 38 4399 751 0.171 132 3.53 1.415 -0.768 0.548
Green R.E.M. 29 3.55 9080 172 1988 37 6308 650 0.103 179 2.3 1.462 -0.987 0.362
Out of Time R.E.M. 33 3.51 12443 229 1991 34 9834 1059 0.108 314 2.24 1.519 -0.968 0.35
Automatic for the People R.E.M. 27 3.86 22613 375 1992 33 9831 1412 0.144 341 2 1.431 -0.843 0.301
Monster R.E.M. 24 3.36 7953 218 1994 31 8817 516 0.059 454 1.98 1.38 -1.233 0.297
New Adventures in Hi-Fi R.E.M. 21 3.65 6794 157 1996 29 5407 298 0.055 190 2.375 1.322 -1.259 0.376
Up R.E.M. 19 3.25 4850 133 1998 27 3436 171 0.05 132 2.14 1.279 -1.303 0.33
Reveal R.E.M. 18 3.14 4340 120 2001 24 3927 212 0.054 167 2 1.255 -1.268 0.301
r.e.m.IX [remixes] R.E.M. 1 2.32 60 2 2002 23 101 68 0.673 7 37.5 0 -0.172 1.574
Around the Sun R.E.M. 11 2.58 3374 120 2004 21 1959 151 0.077 79 3.49 1.041 -1.113 0.543
Accelerate R.E.M. 17 3.24 3879 157 2008 17 2721 125 0.046 86 6.5 1.23 -1.338 0.813
Data table