Testing Protocol Google Lighthouse
Testing Platforms (CLI & Chrome)
Completed 2019-24-07 by FSS Development Team
CLI Lighthouse Version: 5.1.0
Plugin Lighthouse Version: 4.3.1
PageSpeed Insights (PSI) reports on the performance of a page on both mobile and desktop devices, and provides suggestions on how that page may be improved. PSI provides both lab and field data about a page. Lab data is useful for debugging performance issues, as it is collected in a controlled environment. However, it may not capture real-world bottlenecks. Field data is useful for capturing true, real-world user experience - but has a more limited set of metrics. See How To Think About Speed Tools for more information on the 2 types of data.
At the top of the report, PSI provides a score which summarizes the page’s performance. This score is determined by running Lighthouse to collect and analyze lab data about the page. A score of 90 or above is considered fast, and 50 to 90 is considered average. Below 50 is considered to be slow.
When PSI is given a URL, it will look it up in the Chrome User Experience Report (CrUX) dataset. If available, PSI reports the First Contentful Paint (FCP) and the First Input Delay (FID) metric data for the origin and potentially the specific page URL.
PSI also classifies field data into 3 buckets, describing experiences deemed fast, average, or slow. PSI sets the following thresholds for fast / average / slow, based on our analysis of the CrUX dataset:
FCP [0, 1000ms] (1000ms, 2500ms) over 2500ms FID [0, 50ms] (50ms, 250ms) over 250ms
Generally speaking, fast pages are roughly in the top ~10%, average pages are in the next 40%, and slow pages are in the bottom 50%. The numbers have been rounded for readability. These thresholds apply to both mobile and desktop and have been set based on human perceptual abilities.
Distribution and selected value of FCP and FID PSI presents a distribution of these metrics so that developers can understand the range of FCP and FID values for that page or origin. This distribution is also split into three categories: Fast, Average, and Slow, denoted with green, orange, and red bars. For example, seeing 14% within FCP's orange bar indicates that 14% of all observed FCP values fall between 1,000ms and 2,500ms. This data represents an aggregate view of all page loads over the previous 30 days. Above the distribution bars, PSI reports the 90th percentile First Contentful Paint and the 95th percentile First Input Delay, presented in seconds and milliseconds respectfully. These percentiles are selected so that developers can understand the most frustrating user experiences on their site. These field metric values are classified as fast/average/slow by applying the same thresholds shown above.
An overall label is calculated from the field metric values:
The difference between the field data in PSI versus the Chrome User Experience Report on BigQuery, is that PSI’s data is updated daily for the trailing 30 day period. The data set on BigQuery is only updated monthly.
PSI uses Lighthouse to analyze the given URL, generating a performance score that estimates the page's performance on different metrics, including: First Contentful Paint, First Meaningful Paint, Speed Index, First CPU Idle, Time to Interactive, and Estimated Input Latency.
Each metric is scored and labeled with a icon:
Lighthouse separates its audits into three sections:
What device and network conditions does Lighthouse use to simulate a page load? Currently, Lighthouse simulates a page load on a mid-tier device (Moto G4) on a mobile network.
Why do the field data and lab data contradict each other? The Field data says the URL is slow, but the Lab data says the URL is fast! The field data is a historical report about how a particular URL has performed, and represents anonymized performance data from users in the real-world on a variety of devices and network conditions. The lab data is based on a simulated load of a page on a single device and fixed set of network conditions. As a result, the values may differ.
Why is the 90th percentile chosen for FCP and the 95th percentile for FID? Our goal is to make sure that pages work well for the majority of users. By focusing on 90th and 95th percentile values for our metrics, this ensures that pages meet a minimum standard of performance under the most difficult device and network conditions.
Why does the FCP in v4 and v5 have different values? v5 FCP is looking at the 90th percentile while v4 FCP reports the median (50th percentile).
What is a good score for the lab data? Any green score (90+) is considered good.
Why does the performance score change from run to run? I didn’t change anything on my page! Variability in performance measurement is introduced via a number of channels with different levels of impact. Several common sources of metric variability are local network availability, client hardware availability, and client resource contention.