Abstract - Millions of users are exposed to password-strength meters/checkers at highly popular web services that use user-chosen passwords for authentication. Recent studies, such as Egelman et al. (CHI 2013) and Ur et al. (USENIX Security 2012), have found evidence that some meters actually guide users to choose better passwords—which is a fairly rare-bit of good news in password research. However, these meters are mostly based on ad-hoc design. At least, as we found, most vendors do not provide any explanation of their design choices, sometimes making them appear to be a black box. We analyze password meters deployed in selected popular websites, by measuring the strength labels assigned to common passwords from several password dictionaries. From this empirical analysis with millions of passwords, we report prominent characteristics of popular meters. We shed light on how the server-end of some meters functions, provide examples of highly inconsistent strength outcomes for the same password in different meters, along with examples of many weak passwords being labeled as strong or even very strong. These weaknesses and inconsistencies may confuse users in choosing a stronger password, and thus may weaken the purpose of these meters. On the other hand, we believe these findings may help improve existing meters, and make them a more effective tool in the long run.