Oncometrics.org

How We Measure R (relative) Portion

The R (relative) portion of YCR is currently calculated by an independent implementation of the RCR algorithm. This metric was originally designed by the team of B. Ian Hutchins, Xin Yuan, James M. Anderson, and George M. Santangelo in 2016 to evaluate relative performance of scientific publications. More about RCR is here.

Important changes in our implementation:

  1. Calculations of IFs (impact factors) of journals are based on open data, which means transparency. Original RCR relies on imported IFs from a commercial provider.
  2. Calculations take into account all relevant cross-linked articles from open data. Original RCR relies on PubMed indexed articles only.

Overview of algorithms and methods to calculate RCR for the Oncology field, based on the OpenAlex dataset:

  1. How we define our cluster of the Oncology field papers
  2. How we calculate RCR metric based on OA dataset
  3. Our ongoing development of RCR modifications
  4. How we are going to measure the score of paper belonging to the Oncology field. Under development

How we define our cluster of Oncology field papers

We prepared a list of a total 588 onco-terms (separate words and phrases) by following methods:

  1. By using our common sense and some medical background of our team members
  2. By asking an LLM AIs, then carefully checking these suggestions manually
  3. By extracting all articles keywords from the cluster defined by onco-topics in OpenAlex (OA) dataset, more about this below
  4. By final combining, manual checking and cleaning

Then we took in all the articles from the OpenAlex (OA) dataset which contain any of these onco-terms in title or in abstract or in keywords (generated by OA).

Additionally we took in all the topics from OA which contain any of these onco-terms in topics fields: display_name, description, keywords. Then we manually checked all these topics and prepared a final list of 412 topics that belong to the Oncology field.

Using a prepared list of onco-topics we took in all the articles which contain one of these topics as primary topic (primerness was calculated by OpenAlex and is shown in their dataset).

So finally we define our cluster of Oncology field articles (18,085,165) by any of 2 rules:

  1. Article should contain any of onco-terms in title or in abstract or in keywords
  2. OR article should have primary topic from our list of onco-topics

How we calculate RCR metric based on OA dataset

To calculate the RCR metric we used the method described in the article https://doi.org/10.1371/journal.pbio.1002541 by authors of RCR: B. Ian Hutchins, Xin Yuan, James M. Anderson, George M. Santangelo. To understand code and algorithms we studied scripts from their github repository https://github.com/NIHOPA/Relative... .

First of all we calculate impact factors (IFs) for all the journals of OA. For that we use the formula of classic 2-year impact factor:

the number of citations of articles published in 2 previous years and made in the IF year, divided by the total number of articles published in the 2 previous years.

impact factor formula

We used OpenAlex data about citations of articles and the primary journal of each article, so we can count all the articles of each journal for a specific year and all their citations. We used journals only (OpenAlex sources with type “journal”, not “book” and others) here.

Then we define 579,082 “etalon” articles for every year. For that we used iCite dataset and took in only articles with RCR between 0.97 and 1.03, matched by quantity estimation to initial R01-funded etalons described in RCR paper.

For each article of these etalons we define a co-citation network: we took articles which cite one initial etalon and then took all of their references.

Then for each paper of the co-citation network we took a precalculated OA-based impact factor of its primary journal for the year of the paper's publication. After that we define Field Citation Rate (FCR) of each etalon work by taking an average of all IFs of articles from a co-citation network.

fcr romula

Then for each etalon article we took its Citations Per Year (CPY) using the number of its citations from year of publication till this year (2024) divided by number of years from year of publication till this year (2024).

cpy formula

Then we split all of etalon articles by year of publication and for each year we calculate Quantile Regression coefficients (QR coeffs) using FCR and CPY. For that we used a function “quantreg” from python library statsmodel and assuming:

cpy by coeffs formula

So for each year we define 2 coefficients: b and a and we will use them to calculate RCR for non-etalon articles of corresponding years.

Then to calculate RCR for any article we made pretty similar steps:

  1. define co-citation network of article
  2. calculate CPY of article and FCR of its co-citation network using OA-based IFs
  3. calculate Expected Citations Rate (ECR) using QR coeffs of publication year and article FCR:
  4. ecr formula

  5. calculate RCR for article using CPY and ECR:
  6. rcr formula

This way we calculate classic RCR for all articles from our cluster of Oncology field papers and save their values to our database.

If you search for RCR of an article outside of our cluster of the Oncology field papers then it will be calculated online by this algorithm using data from common OpenAlex dataset.

Our ongoing development of RCR modifications

In addition to the classic RCR metric, we calculate our modifications of it, using different approaches to impact factor definition and how to define a co-citation network.

We calculate classic RCR with a 5-year impact factor for FCR and both 2-year and 5-year impact factor using only citations of articles from our cluster of the Oncology field.

Then we try different variations of defining a co-citation network, using only articles from our cluster of the Oncology field and these different IFs.

We also try to not only include or exclude articles from our cluster of the Oncology field, but take them with some weights which more belong to Oncology and which less.

More details about our ongoing RCR modifications later. In development.

How we are going to measure the score of paper belonging to the Oncology field. Under development.