MyWiki:Request a query/Archive 6

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Archive 1Archive 4Archive 5Archive 6

Editors who would become extended confirmed upon their next edit

For statistical purposes, I'm wondering if I could get a count of editors who would become extended confirmed upon their next edit? That is, editors with 500 edits, account age >30 days, and are not/have never previously been extended confirmed? Best, HouseBlaster (talk • he/they) 02:03, 17 August 2025 (UTC)

122. List at quarry:query/96352. —Cryptic 02:21, 17 August 2025 (UTC)
Would editors with more than 500 edits also qualify? How about those with exactly 499 edits (assuming that no pages they have edited get deleted before their next edit)? Certes (talk) 10:41, 17 August 2025 (UTC)
More than 500 might, if they hadn't edited since before extendedconfirmed existed; I don't know offhand whether qualified users were promoted en masse at the time, and can see arguments either way for whether they should have. Will investigate later. No, also ones who made their 501st edit before their account was 30 days old, but none since. So, yeah, likely lots.
499 edits wouldn't, since autopromotion happens on the 501st edit. Deletion doesn't matter either, since user_editcount is only incremented when you make an edit, and not ever decremented. —Cryptic 12:04, 17 August 2025 (UTC)
So, looking like 27736, unless I missed something else. The first autopromotions happened at 23:17, 5 April 2016. (That seems way late to me - I'd have guessed closer to 2010 - but it turns out to be right.) 903 of these accounts have edited since then (page ten of quarry:query/96352), so 26833 are entirely left over from before the group existed. —Cryptic 12:34, 17 August 2025 (UTC)
Thanks, Cryptic and Certes :) HouseBlaster (talk • he/they) 16:25, 17 August 2025 (UTC)

1RR pages missing editnotice

Could I get a list of pages in Category:Wikipedia pages subject to a one-revert restriction for which the corresponding editnotice (Template:Editnotices/Page/[basename]) does not exist? Feel free to leave out the ones in category talk, template talk, etc. Thanks in advance. Extraordinary Writ (talk) 02:14, 10 September 2025 (UTC)

How broad is "etc."? Do you just want mainspace-talk pages, then? —Cryptic 02:45, 10 September 2025 (UTC)
Never mind, quarry:query/97051 has all of them, sorted by the categorized page's namespace. First 2000 and some are for mainspace pages. —Cryptic 02:59, 10 September 2025 (UTC)
Great, thanks. Extraordinary Writ (talk) 05:35, 10 September 2025 (UTC)

Need a table for breaks down of total sidebar usage by type

Please see Wikipedia:Village pump (proposals)#Remove non-ordinal series templates for context. Could someone please help build a query that counts ordinal side bars, such as historical chronologist or otherwise ordinal, a.k.a. numbered lists versus the rest of topical based side bars, as well as the transclusion counts for each and the respective totals? Thanks in advance. Raladic (talk) 16:37, 10 September 2025 (UTC)

Bearing in mind that we can't see the templates' wikitext (see top of this page), how do you propose to distinguish between ordinal and non-ordinal sidebars? For that matter, it's not immediately obvious to me how to programmatically distinguish them even if we could see the wikitext. (And what the heck is a trans liaison count?) —Cryptic 16:46, 10 September 2025 (UTC)
And what the heck is a trans liaison count? - “transclusions”, autocorrect on mobile, sorry about that. Unfortunately sidebars are currently only grouped by topical categories, so we likely would need to use the Category:Sidebar templates by topic and count all sidebars im the caregories outside of Category:History and events sidebar templates which likely has most chronological sidebars. Wont be an ideal science, but yes, that is part of the problem, other than the obvious hunch that chronologies/numbered sidebars are a small minority. hoping maybe someone else has some idea to refine. — Preceding unsigned comment added by Raladic (talkcontribs) 16:55, 10 September 2025 (UTC)
4268 non-redirect templates in the second tree, 3479 in the first but not the second. —Cryptic 17:12, 10 September 2025 (UTC)
Wait, you wanted total transclusion counts too. quarry:query/97072. —Cryptic 17:32, 10 September 2025 (UTC)

Plus or minus an 'h'

Because of confusion on two pages, I was wondering how easy it would be to write a query that returned all pages that if a single 'h' was removed from the title of an article, it would still be an existing article. For example Sigma Phi Alpha -> Sigma Pi Alpha. Yes, the majority of these would probably be for greek letter organizations like the example, but curious as to both the results and the sql. Naraht (talk) 11:59, 10 September 2025 (UTC)

Hrm. I don't think sql is the right tool for this. It's possible, but awkward, to express replacing only the first h, or replacing only the second, or replacing only the third; and I can't think of a way around including each of those cases separately in the query, up to however many h's are in the title of the page with the most of them. (I guess that's When the pawn hits the conflicts he thinks like a king what he knows throws the blows when he goes to the fight And he'll win the whole thing 'fore he enters the ring And he'll win the whole thing 'fore he enters the ring with 26, at least for lowercase h's only.)
The right way to go about this is to download the dump of titles in mainspace and run it through a more sequential language like perl or python. Which I might be willing to do if you have a more productive reason to ask this than just mere curiosity. —Cryptic 15:21, 10 September 2025 (UTC)
(Yes, mere curiousity), I was thinking that the right way from a database standpoint would be to create a temp table (not sure if it would be instatiated or not) with article_name, article_name stripped of all h (article_name_stripped), and count of number of h in article_name (h_count). Then a self join could be done on this table where t1.article_name_stripped = t2.article_name_stripped and t1.h_count = t2.h_count + 1. that result, could be tested in another way Yes. that could give cases where it wouldn't be true (like two articles WhXhYZ and WXYhZ) but would give a *much* smaller universe to do look at.Naraht (talk) 15:54, 10 September 2025 (UTC)
You'd have to instantiate and index the temp table to have any hope of the query completing, yes. The WhXhYZ/WXYhZ example can be dealt with by looking for a Levenshtein distance of 1. Sql Server, Oracle, and Postgres all have native functions to compute that, but Mysql/Mariadb does not, nor AFAIK any other function to compute edit distance; while I can google up some implementations of it in mysql, we don't have permissions to create user functions on the toolserver replicas, so you'd still have to download the page titles and do it locally. —Cryptic 17:57, 10 September 2025 (UTC)
Yeah, the h count being off by one and the levenshtein distance =1 should be enough. I *think* a perl library could be used for that. Also, since h would never be the first letter, each of the 26* letters could be its own run. (Yes, I know the English language wikipedia allows for articles which start with either modified latin letters or things in other alphabets, however combined they are tiny). Oh well. Thank you for the ideas.Naraht (talk) 20:19, 10 September 2025 (UTC)

File:Symbol redirect vote.svg Completed

List of new WikiProject pages

I'd like a query that will find pages meeting these criteria:

  • Begin with Wikipedia:WikiProject
  • Are not subpages
  • Have been created recently (e.g., last few months, this calendar year)

WhatamIdoing (talk) 01:20, 9 September 2025 (UTC)

I recently fixed the one at Database Reports that basically does this but only looks at the past 30 days (from recentchanges). Check this out: Wikipedia:Database reports/New WikiProjects. If you need to go back further, I could rewrite it. Let me know. Stefen 𝕋ower's got the power!!1! GabGruntwerk 02:15, 9 September 2025 (UTC)
Thanks! I looked back through the history to find some others. It looks like in the last year or so, about 10 pages have been created, and probably more than half of those could be deleted at MFD. I'll take the list to WT:COUNCIL. WhatamIdoing (talk) 02:51, 9 September 2025 (UTC)
Looking at the database report, I'd add to my original criteria: "4. Not a redirect (including 'not a soft redirect')" and "5. Not already deleted." WhatamIdoing (talk) 02:59, 9 September 2025 (UTC)
In the previous report, it included a deleted one only because I asked for it to be deleted after the report was generated. Anyway, I think these criteria are mostly doable, although I'm not sure offhand if the database supports checking for soft redirects. I can probably look at this tomorrow. Stefen 𝕋ower's got the power!!1! GabGruntwerk 03:17, 9 September 2025 (UTC)
WhatamIdoing: Check out this query. It goes back six months, and should keep out both hard and soft redirects (removing soft redirects was easier than I initially thought). Also, there should never be any deleted ones in the list. Let me know if this works for you. Stefen 𝕋ower's got the power!!1! GabGruntwerk 04:25, 10 September 2025 (UTC)
Yes, that looks good. WhatamIdoing (talk) 23:08, 10 September 2025 (UTC)
(Wouldn't using the page creation log be easier and faster? Plus, it wouldn't limit you to 30 days' age. —Cryptic 03:05, 9 September 2025 (UTC))
Perhaps. I didn't challenge myself to make the existing report better like I did with "WikiProjects by human changes". I literally just fixed the database bug in it. Stefen 𝕋ower's got the power!!1! GabGruntwerk 03:10, 9 September 2025 (UTC)
Hrm. Neither of these are going to find pages created somewhere else and later moved to Wikipedia:WikiProject Whatever, like Wikipedia:WikiProject Sustainable Development was in July. I found that by starting at page and working back to a creation timestamp, but that wouldn't work either for a page created long ago and only recently moved. And the move log is miserable to work with, though since we only care about the current location, it's at least feasible. —Cryptic 03:23, 9 September 2025 (UTC)
That looks like the only one, which isn't really surprising. I didn't attempt to look more than one move back in the query, but there were few enough to check manually; all the other moves this year were either from other WP:WikiProject* titles or were reverts of moves away from those. —Cryptic 03:43, 9 September 2025 (UTC)
In that case, I could check for "Wikiproject" as well as "WikiProject". Since we show redirects, we would pick up where it originally came from. That may not resolve all cases, but resolving all cases might be harsh, performance-wise. Stefen 𝕋ower's got the power!!1! GabGruntwerk 03:45, 9 September 2025 (UTC)
The case I expected to find (and didn't initially) was a page created in userspace or draft:, edited a bit, then moved to WP:WikiProject*. Going back a bit further to the start of 2023, that's happened at (at least) Wikiproject Africa environment, Draft:Wikipedia:WikiProject Australian Transport, User:LoomCreek/sandbox/Wikiproject Housing and Tenant Rights, and User:Zazpot/WikiProject Risk. There aren't enough pages currently at Wikipedia:WikiProject% that querying them is time-intensive; the main issue is that it's very difficult to look more than one move back to omit the innocuous moves like WP:WikiProject Ireland->Baldonnel, County Dublin->WP:WikiProject Ireland but not ones like [[(some hypothetical title in userspace)]]->WP:WikiProject Abandoned Articles->WP:WikiProject Abandoned articles. —Cryptic 03:58, 9 September 2025 (UTC)
Based on my current understanding (and I could be mistaken), this is resolved in the database with rev_parent_id=0 after a page move. So if you're looking for when a WikiProject enters the project namespace, you know that it has become a "new" WikiProject. When it's a draft or in user space, it's in development (or just a user's brainstorming), and we don't care about those for our purposes. Stefen 𝕋ower's got the power!!1! GabGruntwerk 09:16, 10 September 2025 (UTC)
rev_parent_id isn't completely robust; it's not updated when history is merged or split. So, for example, there are three revisions of 2024 Minas Gerais road crash with rev_parent_id=0 (1264606347, 1264609936, 1264650842) and none of Home library service. The way to get around that is to JOIN revision ON rev_id = (SELECT rev_id FROM revision WHERE /*something with a composite index with rev_timestamp, usually rev_page=some page id*/ ORDER BY rev_timestamp ASC LIMIT 1), which looks like it's less efficient than looking at rev_parent_id but is actually much more so, and always finds the earliest visible revision at a given page (or other constraints).
That's admittedly rare, but it does happen. The more serious problem with your query 97050 is that a page that has more than a six-month delay between creation and their move to a WikiProject title will never show up on it. —Cryptic 15:03, 10 September 2025 (UTC)
Since we're talking about WikiProjects, I'm not sure why we would focus on article examples. Given WikiProjects go through these type of changes, are they of concern to us here? If a WikiProject is merged or split, that is occurring normally per a community decision, likely well after they are "new", and a new WikiProject created from a split would show up as a new page.
Per the last point, I had to pick a cut-off point per the request. It can easily be extended to longer periods. Also, we're looking at new pages starting with 'WikiProject'. If there is a move to a WikiProject title during the six months, it should be captured. I don't see the serious problem here, but if you can rework the query to demonstrate the issue, that would show me what I'm missing here. I am teachable. :) Stefen 𝕋ower's got the power!!1! GabGruntwerk 19:19, 10 September 2025 (UTC)
Also, I've written a subquery like you show for a different report to find an original creator of an article, like when it was a draft or in their user space (so I can show the true creator), but since we're only looking at new pages in the project space, this case seems awfully rare. Stefen 𝕋ower's got the power!!1! GabGruntwerk 19:34, 10 September 2025 (UTC)
Wikipedia:WikiProject Risk was in userspace for more than six years before being moved to its current title. It's conceivable that some day, a page created in 2002 could be moved to a WikiProject title. What you're querying for is pages currently at a WP:WikiProject title whose first revision was within the last six months. That's inevitably going to eventually miss newly-blue WikiProject titles no matter what time cutoff you pick. Since there's generally going to be few results, I'd think it would be more preferable to have false positives (by including all recent moves, even if they were quickly reverted like the one from WP:WikiProject Ireland above; or even if merely from one WikiProject title to another, like the one from Wikipedia:WikiProject Abandoned Articles, also above) than false negatives.
I picked the anomalous rev_parent_id examples because those were the first illustrative ones I found going through my own deletion log and looking for "history merge" and "history split". If you want examples more relevant to the question at hand, then Wikipedia:WikiProject Finance & Investment, Wikipedia:WikiProject Germany, Wikipedia:WikiProject Protected areas, and Wikipedia:WikiProject Women Do News all have two revisions each with rev_parent_id = 0; and Wikipedia:WikiProject Rosblofnari has none. —Cryptic 00:08, 11 September 2025 (UTC)
Per the first point, I don't see significant damage from ignoring these rare cases. After all, we'll see these projects pop up in the "WikiProjects by human changes" and "WikiProjects by changes" reports. But if you want to modify the query I wrote that WhatamIdoing requested, feel free.
Per the second point, I will review the implications for the New WikiProjects report and revise the SQL if necessary. Note this wasn't originally my code, and my effort was focused on making sure it didn't fail due to database changes, rather than making it smarter. But I will look at it. Thank you for your ideas! Stefen 𝕋ower's got the power!!1! GabGruntwerk 00:20, 11 September 2025 (UTC)

Flooding AFD

@Czarking0 has an idea about how to measure AFD nominations, specifically focused on how many editors open a disproportionate number of AFD nominations. The context is a discussion about AI/LLM users potentially flooding AFD in the future.

This would require knowing the name of the editor (or some sort of unique identifier that's consistent within the set) who created any given AFD subpage + the creation date for that subpage (e.g., I created Wikipedia:Articles for deletion/Theopathy on July 3rd, so the needed information would be something like "WhatamIdoing – 20250703012900").

But rather than doing Czarking0's original analytical plan, which sounds like a lot of work, could you look at a random sample of weeks in a year, and tell us what the most number of AFD subpages created by any individual editor is? If nobody's normally creating more than 10 or 20 in one week, I think we can safely skip the "hard work" part and propose a limit based on common sense. WhatamIdoing (talk) 01:07, 9 September 2025 (UTC)

I think there is a lot value in common sense stats. Out of professional pride, I will note there are a ton of caveats for estimating maximums which I would have to re-familiarize myself with to do a true statistical study. In terms of sampling, I think we could draw meaningful results from eight random weeks in each of the last five years. That is also a small enough amount of data that I could do it by hand if needed. Czarking0 (talk) 01:18, 9 September 2025 (UTC)
Five years takes us back to August/September of 2020, which was during the Covid lockdowns. Editing patterns were off that year. Could we do 10 random non-overlapping weeks in the last four years? WhatamIdoing (talk) 01:22, 9 September 2025 (UTC)
I think I can save you some steps? quarry:query/97012 has, for each week of the year, everyone who's created more than 10, along with the total afds created that week, starting in 2020. I don't think a random sampling of weeks would have given you a good picture, whether or not you picked one of the weeks where a single user created a full quarter of the afds (there were several). But if you really want the full datasets, let me know. —Cryptic 02:46, 9 September 2025 (UTC)
It looks like there are 1,661 records showing an editor creating 10+ AFDs in a single (calendar) week.
The highest in that dataset is 135 in a single week, which means an average of 20 per day should have no effect on anyone. Only nine editors ever listed more than 70 in a single week, which means that an average of 10 per day should have almost no effect on anyone. (The underlying goal is to establish a "no flooding" rule before anyone gets the idea to turn an LLM-based bot loose on AFD.)
@Czarking0, I wonder if you think that more evaluation would really be helpful, or if we should just go with a simple "max 70 noms per week" proposal. WhatamIdoing (talk) 06:46, 14 September 2025 (UTC)
I think this sort of data gets you to something you could build consensus around regardless of my opinion. It really comes down to what does the community consider flooding? 10 in a week does not seem like flooding to me (I think I have done that). 70 in a week does seem like flooding. I agree that additional analysis is mostly academic but I am unsure what the community would think. Czarking0 (talk) 16:21, 14 September 2025 (UTC)
I've asked people what they think 'flooding' is at Wikipedia:Village pump (idea lab)#Quick straw poll about AFD flooding. (Neither of us have done 10 in the same week during the last five years, but I agree with you that 10 in a week feels very reasonable. I might even accept 10 in a day.) WhatamIdoing (talk) 19:55, 14 September 2025 (UTC)

Query to find articles using both Template:use dmy dates and Template:use mdy dates (and similar)

This is a request for a query to find articles using both Template:use dmy dates and Template:use mdy dates, motivated by this conversation. I know that it is possible to use "hastemplate" searches for this, but I'd like to incorporate it into a database report.

I would also like to include similar conflicts, such as articles that contain both {{Use British English}} and {{Use American English}}, so a general-purpose query that lists pairs of such templates would be helpful, so that I can expand the report as more conflicting pairs of templates are identified.

Please let me know if this request is not specified well enough. – Jonesey95 (talk) 21:01, 15 September 2025 (UTC)

quarry:query/97196. —Cryptic 22:03, 15 September 2025 (UTC)
Super. And thanks for tweaking Wikipedia:Database reports/Multiple "Use" templates. I was making the exact same edit after reading the documentation at {{Database report}}. – Jonesey95 (talk) 22:47, 15 September 2025 (UTC)

Hot articles

I'm trying to come up with a query for the most edited pages with a talk page category (say Category:Horse breeds task force articles) in the last 7 days. This is for use in Module:Database reports/Hot articles which currently supports providing a PageAssessments-recognized project as input but not a category. (Prior art: https://github.com/kaldari/hotarticles/blob/master/runbot.php, but seems a bit obfuscated.) – SD0001 (talk) 15:36, 14 September 2025 (UTC)

The github version jumps through a lot of hoops trying to make things more efficient, but it doesn't matter for the common case and I don't... think... it would help much for cases with many pages in the category and very, very many revisions in the articles. Especially since it only looks at recentchanges anyway, so can only ever see revisions from the last 30 days. (Also, it can go by talk pages linked from a template instead of in a category, but that difference is even smaller now.) quarry:query/97157 for a version that looks at revision, so it can go back further than 30 days; the change for the template version is trivial - categorylinks to templatelinks and cl_* to tl_* - and if it really does get bad performance with ancient articles and you don't need to look past the recentchanges period, that's straightforward too. —Cryptic 16:42, 14 September 2025 (UTC)
Here is a simplified version that goes by the talk page category, but I have been using a more sophisticated approach for my version of Hot Articles that still goes by the PageAssessments project name. Stefen 𝕋ower's got the power!!1! GabGruntwerk 16:49, 14 September 2025 (UTC)
Hunh. I had thought FETCH FIRST 10 ROWS ONLY was an Oracleism; no idea that Mariadb supported it. —Cryptic 17:13, 14 September 2025 (UTC)
This was something I discovered when hunting down a way to do proper ranking, not wanting to leave out articles that have the same number of edits at the end of the list. I guess it can be called a "10+" result. Stefen 𝕋ower's got the power!!1! GabGruntwerk 17:24, 14 September 2025 (UTC)
I'd probably have flailed around with something based on DENSE_RANK(). This is a lot more elegant. —Cryptic 17:27, 14 September 2025 (UTC)
(Which oracle?) WhatamIdoing (talk) 19:56, 14 September 2025 (UTC)
Oracle Database by Oracle Corporation. Stefen 𝕋ower's got the power!!1! GabGruntwerk 20:18, 14 September 2025 (UTC)
Also here is an implementation of my version of Hot Articles which includes the specific query in production (trimmed down a bit from the one in Quarry). Stefen 𝕋ower's got the power!!1! GabGruntwerk 17:28, 14 September 2025 (UTC)
Might be worthwhile to go with a recentchanges-based query for the convenient access to rc_bot. You can sort of get at it from revision, but only by looking at the editor's current groups; and that's not guaranteed to be accurate as of when the edit was made. —Cryptic 18:09, 14 September 2025 (UTC)
Thanks all. I set up Module:Database reports/Hot articles by category based on Cryptic's query. I wanted to add it as an option within Module:Database reports/Hot articles but that proved too difficult due to the different number of columns – the by-category listing doesn't contain rating. – SD0001 (talk) 04:40, 24 September 2025 (UTC)

Admins using the block button

Can you tell me how many admins blocked anyone (registered or otherwise) during the last year? We have 827 admins, and I have heard that most of them use their admin tools only occasionally, so I would expect the number who use the block button specifically to be low – but maybe you'll surprise me. The total number (no need for individual names) is enough for me. WhatamIdoing (talk) 20:37, 6 October 2025 (UTC)

@WhatamIdoing: According to this query, 414 of them made at least one block in the past year, out of which 41 made more than 1000 blocks each (~50% and ~5% of all admins, respectively). NguoiDungKhongDinhDanh 21:03, 6 October 2025 (UTC)
quarry:query/97803 has totals for different uses of the block button too. Do you need me to filter accounts that currently aren't admins anymore? I can think of at least one where that's the case. —Cryptic 21:04, 6 October 2025 (UTC)
No, that's good enough. Thanks! WhatamIdoing (talk) 21:35, 6 October 2025 (UTC)

Why is this query so slow?

This query of mine has been running for about 24 hours. Its content is as follow:

SELECT
	CONCAT('Thảo luận Thành viên:', page.page_title) as 'Page name',
    COUNT(revision.rev_id) as 'Revisions'
FROM
	page
JOIN
	revision ON revision.rev_page = page.page_id
JOIN
	actor ON actor.actor_id = revision.rev_actor
WHERE
	page.page_namespace = 3 AND
    page.page_is_new = 0 AND
    actor.actor_name = 'TuanminhBot' AND
    is_ipv4(page_title)
GROUP BY
	page.page_id
HAVING
	COUNT(DISTINCT revision.rev_actor) = 1 AND
    COUNT(revision.rev_id) > 1
ORDER BY
 	COUNT(revision.rev_id) DESC

I expect this to return something between 10,000 and 300,000 rows, give or take. An almost identical query, whose intention is to return only the amount of pages, has also been running for 21 hours. NguoiDungKhongDinhDanh 08:41, 7 October 2025 (UTC)

The actor view was introduced in a 2019 downgrade. It joins the actual actor table, which is hidden, to eight other tables, which slows down many queries dramatically. Using the actor_revision view instead may undo some of the damage. Certes (talk) 10:49, 7 October 2025 (UTC)
@Certes: quarry:query/97845 (actor) took 105 seconds to return 100 rows, compared to quarry:query/97844 (actor_revision)'s 117 seconds. Am I using it wrong? NguoiDungKhongDinhDanh 11:23, 7 October 2025 (UTC)
The benefit in using actor_revision over actor is that, as Certes says, it simplifies the actually-executed query, which in some cases makes the optimizer choose a better query plan. Most of the time, it won't matter, and it probably won't here.
I wrote a big long thing here about using revision_userindex instead of revision - that choice of view does make a difference, since it allows use of indices including rev_actor - but since TuanminhBot has a edit count of some 34.5 million, it's still more selective to look at page first. Query plan (either for revision or revision_userindex) looks like:
where you can see that it had to fall back on a full table scan of page with an estimate of 15.6 million rows to look at. (All the entries with select_type "SUBQUERY" are for resolving the actor view; using actor_revision will cut them down to just a single entry, the one currently numbered 14.) Only thing I can think of to improve that is to add another condition on page_title so that it only looks at page titles starting with digits (since is_ipv4(page_title) isn't indexable) - adding AND page.page_title BETWEEN '1' AND ':' to the end of your WHERE clause gets the estimate row count on page down to 7805216, about half.
Which gets me to stop and think again, because 15 million isn't all that high to start with - I was thinking in terms of enwiki numbers, where it's around 58 million. Running SHOW ANALYZE for the amended query shows that it gets through all those rows in page reasonably quickly, then hangs. What I'm guessing is happening (and I'm running right up against the edge of my competence here, and am probably already over it) - is that either it's stuck counting and sorting those results (for the HAVING clause), or that there are a lot - lot - of results, enough to overwhelm output. Quarry can't really deal with more than 100k rows in a result, though that shouldn't affect me at toolserver.
Next thing I'd try is to find a couple pages that you know will be output, for example "User talk:193.42.1.7", and restricting page_title to something like "page_title LIKE '193.42.1.%'" to see if you get results that way. I'm out of time to look at this; I can investigate some more tomorrow if what I've written so far doesn't help. —Cryptic 14:04, 7 October 2025 (UTC)
Also, this query won't do what the title says it does. The WHERE clause is dealt with well before HAVING, so informally, you're getting a bunch of pages, then all their revisions, then removing revisions not made by TuanminhBot, then checking to see if there was only one user among what's left; and that last part will always be true. —Cryptic 14:24, 7 October 2025 (UTC)
@Cryptic: Thanks. I'm {{User SQL-0.5}}, so I unfortunately don't understand everything you said. Here's what I could make of it:
  • Add AND page.page_title BETWEEN '1' AND ':' to limit the number of indexed pages.
  • Try adding page_title LIKE '193.42.1.%' and see if the query runs faster.
  • The actor.actor_name = 'TuanminhBot' condition renders HAVING COUNT(DISTINCT revision.rev_actor) = 1 useless.
I came up with quarry:query/97847 (~64,000 rows in ~50 seconds). That's much faster than before, but the pages returned are not guaranteed to be created by TuanminhBot. How do I factor that into the query? NguoiDungKhongDinhDanh 17:53, 7 October 2025 (UTC)
That's all correct. The way you properly limit it to pages where all the edits were made by TuanminhBot (instead of just all the edits were made by the same user) is to stuff all that into a subquery, then select the pages where the single user was the bot, as in quarry:query/97857. Turns out all but one of those pages are; the only exception was vi:User talk:1.52.1.184.
Since you're getting that many results for 1.52.*, you're probably going to have to run separate queries for each pair of digits (0-255).(0-255).* - if the query had returned 640000 results and instead of 64000, Quarry wouldn't have been able to deal with it. I was hoping you'd be able to get by on page_title LIKE '1.%', another query for page_title LIKE '2.%', and so on. —Cryptic 00:51, 8 October 2025 (UTC)
page.page_title LIKE '1.%' returns ~258,000 rows in ~211 seconds. I suppose that's good enough. Thanks! NguoiDungKhongDinhDanh 06:06, 8 October 2025 (UTC)

Inactive rollbackers

Could someone create/update a query to tell me how many of the 6,978 non-admins who have the MediaWiki rollback user right and haven't edited for 3+ years? (There won't be any admins.) This should be similar to Wikipedia talk:Autopatrolled#RfC: Inactivity requirements for autopatrolled. WhatamIdoing (talk) 21:40, 11 October 2025 (UTC)

The same query as in Wikipedia:Database reports/Inactive users with advanced permissions can do this by editing the constants at the start. I've done a one-time run at quarry:query/97992. —Cryptic 01:11, 12 October 2025 (UTC)
That says we have 2,889 inactive accounts with MediaWiki rollback that haven't edited for three years, which is 41%. Thanks. I'm going to make the same proposal at Wikipedia talk:Rollback as we did at WT:Autopatrolled earlier this year. WhatamIdoing (talk) 02:18, 12 October 2025 (UTC)

Under categorized pages

Hi. Is it possible to generate a list of articles by category count? This would need to exclude hidden cats. I'm specifically interested in biographical articles beyond those already identified as needing additional categories. So I guess something like ... sits within Category:Births by year or has Category:Living people and total number of categories is x (would suggest starting with two). Thanks!! Jevansen (talk) 23:27, 25 October 2025 (UTC)

Yes, it's possible. I already have a per-project query like this here. It shows articles with the least amount of categories (4 or less; number can be changed), and excludes hidden cats and stub cats. If you need me to refine it, let me know. Stefen 𝕋ower's got the power!!1! GabGruntwerk 23:52, 25 October 2025 (UTC)
Excellent, I'll play around with that. Thanks Stefen. Jevansen (talk) 23:59, 25 October 2025 (UTC)

Non-IP user-talk pages with "Blanked IP talk"

Today I came across a still-present mistaken bot edit from several years ago. It made that same edit twice to that same user-talk in a several-day timeframe, so I am wonderif if there are any other such edits that were made and have still not been corrected. User:BD2412bot has been inactive for a few years, but appears to have been making many thousands of edits in the timeframe of that bad edit so I can't easily scan its contributions-list by eye. Is it possible to generate a list of all user-talk pages of non-IP editors that have {{Blanked IP talk}}? DMacks (talk) 15:31, 2 November 2025 (UTC)

I think quarry:query/98734 should be what you're looking for? I outputted the pages as full URLs but can change it to bare titles if preferable. J11csd (talk) 16:26, 2 November 2025 (UTC)
Messed up the link but fixed now. J11csd (talk) 16:29, 2 November 2025 (UTC)
Thanks! No problem for me to convert among formats, once I had the actual information in some format. DMacks (talk) 22:07, 2 November 2025 (UTC)

Simple (I think) category query

Could someone create a query that does the following:

  • Looks for articles + drafts in Category:CS1 maint: url-status
  • Checks if the page matches a regex that I give it: ( \{\{(?=\s*cite)(?=.*?url-status\s*=\s*live)(?!.*?archive-url).*?\}\} )
  • Sorts the pages by the number of matches it has descending, then limiting the number of rows to 900.
  • The output will be one column with the article name, another with the number of matches

This'll be be a page that I can run Citation bot on to fix the common CS1 error. EatingCarBatteries (contributions, talk) 06:05, 4 November 2025 (UTC)

The database replicas do not have access to page content, so queries which require checking wikitext cannot be answered with database queries. (Also, that regex doesn't do what you probably think it does. Try testing against "{{cite}}url-status=live" and "{{cite url-status=live}}archive-url".) —Cryptic 06:54, 4 November 2025 (UTC)
Good to know, thank you
Yeah I'm obviously by no means an expert. That regex is AI-generated as I'm still learning, but I have successfully ran it on dumps in AutoWikiBrowser. I knew there were some misses, but I figured that it got most of the stuff that it needed. EatingCarBatteries (contributions, talk) 07:13, 4 November 2025 (UTC)

How many page creations a day by IPs and non-autoconfirmed accounts in draftspace?

In phab:T409366, we discovered that temporary accounts can't create drafts due to a bug. I'd be interested in figuring out how many drafts this group created on a random sample day from before temporary accounts, to figure out the impact of the bug. For example:

SELECT COUNT(*)
FROM revision
WHERE rev_parent_id = 0 /* first revision of a page */
AND rev_timestamp > 20251001000000
AND rev_timestamp < 20251001235900
AND (
  /* revision author is an IP */
  OR /* revision author is not autoconfirmed */
)

But that doesn't use the right indexes and the revision table has a billion rows, so needs optimization to work. –Novem Linguae (talk) 01:04, 6 November 2025 (UTC)

Unless you want to go back a very long time, the creation log is better than searching revision - it's smaller, has more appropriate indexes, it keeps the entry even if the page gets deleted, and it shows where the page was originally created rather than where it is now. The last is a major win for drafts that got accepted - you wouldn't be able to distinguish them just with revision - but it's going to have some false negatives for e.g. userspace drafts that got moved to draftspace.
The harder part will be verifying that a user wasn't autoconfirmed when they made the draft. It's not difficult to verify that the account was registered less than four days before the creation log entry, but the most straightforward way to check whether they had ten previous edits is going to be very, very slow when it runs across a user who has a couple million edits. —Cryptic 01:24, 6 November 2025 (UTC)
If you accept an alternate-world definition of "autoconfirmed" where you have to have 10 currently non-deleted edits, quarry:query/98850 has all results for your October 1 test date, and quarry:query/98851 has counts for each day in 2025. Those are upper bounds; it's likely very common for such users to have deleted edits, so these both will include creations by users that actually were autoconfirmed at the time. If you need accurate numbers, I'd suggest going through the October 1 hits manually - there shouldn't be any false negatives other than the moved pages I mentioned above. —Cryptic 02:41, 6 November 2025 (UTC)
...or at least, there wouldn't have been if I hadn't accidentally omitted creations by users with fewer than ten currently-live edits, whether from before or after the page creation in question. When corrected, that approximation had 177 hits on October 1, not the 74 I found before.
Current versions of both queries check for deleted edits properly too now, unless I did something else similarly dumb. (It's been a very long day.) I can't say I'm at all proud of how I went about it, either. And it doesn't even make a whole lot of difference; including the deleted edit check only removed five draft creations from October 1, and similar numbers from other days in 2025. —Cryptic 04:34, 6 November 2025 (UTC)
FYI, bug fixed. Thanks @Novem Linguae for it. :) – robertsky (talk) 08:23, 6 November 2025 (UTC)

Could someone please run the following query for me? I am unable to do so myself. Thank you!

SELECT DISTINCT page_title FROM page JOIN externallinks ON el_from = page_id WHERE page_namespace = 0 AND (el_to_domain_index LIKE 'com.twitter.%' OR el_to_domain_index LIKE 'com.x.%'); ~2025-32654-48 (talk) 22:30, 10 November 2025 (UTC)

No results for that; el_to_domain_index always begins with a protocol. quarry:query/99001 has what you meant. (Spoiler: there's lots.) —Cryptic 22:39, 10 November 2025 (UTC)

List of transclusions

I am looking for the list of pages that transclude {{Annual readership}}, or one of its redirects. The count tool reports a tally of 53,098, but the 'What links here?' tool (linked as 'sorted list' at Special:WhatLinksHere/Template:Annual_readership) never returns. (I don't think the problem is a volume issue, or flooding the browser; I tried the 'sorted list' link from a template that has 50 transclusions, and it never returned, either.) Can you provide a query that will produce the sorted list? There are a half a dozen redirects; if that complicates the solution, it's fine to just drop redirects. Adding interested party Andrew Davidson. Thanks! Mathglot (talk) 20:57, 8 November 2025 (UTC)

petscan:40332113. —Cryptic 21:11, 8 November 2025 (UTC)
Thank you! And I even knew about Petscan, but forgot I could use it for that. Thanks again, Mathglot (talk) 22:41, 8 November 2025 (UTC)
Yes, and thanks from me too -- it's a useful addition to my toolkit. Andrew🐉(talk) 23:20, 8 November 2025 (UTC)
Also can get it from the command line with wikiget: wikiget -b "Template:Annual readership" -tt > out.txt .. -- GreenC 17:11, 12 November 2025 (UTC)

Refreshing my article-creation list

Thanks to {{Database report}}, I'd like to automate my new-page archive, which hasn't been updated in all of 15 years. (Yes, count 'em--15.) Problem is, I have little experience with SQL (and what little I've garnered comes as a QGIS user), and I haven't encountered anyone else doing what I'm planning to in userspace. (It's saying a lot when DPL4, featured on select Miraheze wikis like mine, gets the job done with simpler and less intimidating syntax.)

Objective: Select all articles created by this requesting user, from newest to oldest (excluding redirects and deleted material), including page and prose size in bytes (and possibly article class and ref/category count)--thus replicating XTools. Automatic updates every three months. --Slgrandson (How's my egg-throwing coleslaw?) 07:28, 17 November 2025 (UTC)

quarry:query/99145 as a prototype. I've included hidden categories in the category count (solely because that's a little less effort). Prose size isn't available. Neither is reference count, though an approximately-correct relative ordering can be found by counting external links; that could be refined by excluding e.g. links to web.archive.org, but it's still going to count links in external links sections instead of references, the 'expand this' link in stub templates, searches from notability tags, etc., and will still exclude non-externally-linked references like books or Harvard refs. —Cryptic 21:23, 17 November 2025 (UTC)

Could someone please modify the query at quarry:query/99001? I need to count citations to twitter.com or x.com, but only based on page versions from June 2023. This would likely require joining with the `revision` table and filtering on `rev_timestamp` to be between '20230601000000' and '20230630235959'. I understand that extracting dates from the citation text itself isn't possible through Quarry. Thank you! ~2025-32977-37 (talk) 11:27, 12 November 2025 (UTC)

SQL query request for new citations to twitter.com/x.com in August 2023

Could someone please run the following query for me? I am unable to do so myself. The query is intended to count the number of new citations to 'twitter.com' or 'x.com' for each day of August 2023. Thank you!

-- This query counts the number of new citations to 'twitter.com' or 'x.com'
-- for each day of August 2023.
-- NOTE: It cannot extract the publication date from the citation text. Instead,
-- it uses the timestamp of the earliest revision of the page containing the
-- link as a proxy for the addition date.

SELECT
    first_appearance_day,
    COUNT(*) AS new_citations_count

== SQL query request for new citations to twitter.com/x.com in June 2023 ==

Could someone please run the following query for me? It is based on the query for August 2023 but modified for June 2023. Thank you!

<syntaxhighlight lang=sql>
-- This query counts the number of new citations to 'twitter.com' or 'x.com'
-- for each day of June 2023.
-- NOTE: It cannot extract the publication date from the citation text. Instead,
-- it uses the timestamp of the earliest revision of the page containing the
-- link as a proxy for the addition date.

SELECT
    first_appearance_day,
    COUNT(*) AS new_citations_count
FROM (
    -- Subquery to find the first revision timestamp for each unique external link
    SELECT
        el.el_id,
        DATE_FORMAT(MIN(rev.rev_timestamp), '%Y-%m-%d') AS first_appearance_day
    FROM
        externallinks AS el
    JOIN
        revision AS rev ON el.el_from = rev.rev_page
    WHERE
        -- Filter for twitter.com and x.com links using the indexed domain column
        (el.el_to_domain_index LIKE 'http://com.twitter.%' OR el.el_to_domain_index LIKE 'https://com.twitter.%' OR el.el_to_domain_index LIKE 'http://com.x.%' OR el.el_to_domain_index LIKE 'https://com.x.%')
    GROUP BY
        el.el_id
) AS link_creations
WHERE
    -- Filter the results to include only links first appearing in June 2023
    first_appearance_day BETWEEN '2023-06-01' AND '2023-06-30'
GROUP BY
    first_appearance_day
ORDER BY
    first_appearance_day ASC;

FROM (

   -- Subquery to find the first revision timestamp for each unique external link
   SELECT
       el.el_id,
       DATE_FORMAT(MIN(rev.rev_timestamp), '%Y-%m-%d') AS first_appearance_day
   FROM
       externallinks AS el
   JOIN
       revision AS rev ON el.el_from = rev.rev_page
   WHERE
       -- Filter for twitter.com and x.com links using the indexed domain column
       (el.el_to_domain_index LIKE 'http://com.twitter.%' OR el.el_to_domain_index LIKE 'https://com.twitter.%' OR el.el_to_domain_index LIKE 'http://com.x.%' OR el.el_to_domain_index LIKE 'https://com.x.%')
   GROUP BY
       el.el_id

) AS link_creations WHERE

   -- Filter the results to include only links first appearing in August 2023
   first_appearance_day BETWEEN '2023-08-01' AND '2023-08-31'

GROUP BY

   first_appearance_day

ORDER BY

   first_appearance_day ASC;

</syntaxhighlight> ~2025-33047-06 (talk) 16:04, 12 November 2025 (UTC)

Not sure what you are asking. Are you unable to use Quarry and run it yourself? -- GreenC 17:07, 12 November 2025 (UTC)
This isn't going to do what you want. If it completed (it wouldn't), it would show, of the pages that currently have an external link to twitter or x, the ones that were created in August 2023. There's no guarantee that the links were present since creation, and no reason to think that no links were newly-added to existing pages in that month.
To find when the links were actually added, you'd have to use the text table, and compare the wikitext between past revisions. That's not practical, even if you only look at pages that currently have a matching external link; and the public replicas don't have text anyway. Best bet is to download a consecutive pair of database dumps from around then, extract a list of pages containing external links from each, and compare them. —Cryptic 18:25, 12 November 2025 (UTC)
Nonetheless, quarry:query/99051 has what your query above would've resulted in, if it worked (unlikely) and if it didn't double-count pages for each link they have matching one of those formats. —Cryptic 19:27, 12 November 2025 (UTC)

SQL query to count total URL citations to twitter.com and x.com

Could someone please run the following query for me? I am unable to do so myself. This query counts the total number of URL citations to twitter.com and x.com. Thank you!

SELECT COUNT(*) FROM page JOIN externallinks ON el_from = page_id WHERE page_namespace = 0

 AND (el_to_domain_index LIKE 'http://com.twitter.%'
      OR el_to_domain_index LIKE 'https://com.twitter.%'
      OR el_to_domain_index LIKE 'http://com.x.%'
      OR el_to_domain_index LIKE 'https://com.x.%'); ~2025-34523-00 (talk) 15:13, 18 November 2025 (UTC)
224820. (Be aware that if a single page has more than one identical link to a matching url, it'll only be counted once.)
Creating a non-temporary account here logs you into quarry: as well, so you won't have to wait on someone else running these for you. —Cryptic 20:57, 18 November 2025 (UTC)

SQL query for new citations to twitter.com/x.com on August 3rd, 2023

Could someone please run the following query for me? I am unable to do so myself. This query is intended to count the number of new citations to 'twitter.com' or 'x.com' on August 3rd, 2023. Thank you!

SELECT COUNT(*) FROM externallinks AS el JOIN revision AS r ON el.el_rev_id = r.rev_id JOIN page AS p ON r.rev_page = p.page_id WHERE p.page_namespace = 0 AND r.rev_timestamp BETWEEN '20230803000000' AND '20230803235959' AND (el.el_to LIKE 'http://twitter.com/%' OR el.el_to LIKE 'https://twitter.com/%' OR el.el_to LIKE 'http://x.com/%' OR el.el_to LIKE 'https://x.com/%');

~2025-34472-43 (talk) 11:43, 19 November 2025 (UTC)

How are you coming up with these queries? —Cryptic 18:19, 19 November 2025 (UTC)

Hello, I am trying to find the number of *revisions* on English Wikipedia (enwiki) that added an external link to 'twitter.com' or 'x.com' for each day in August 2023. I have been unsuccessful in finding an existing query on Quarry that meets these specific requirements. I also attempted to create a new query myself, but I was unable to log in. Based on previous discussions on this page, I understand that directly querying the wikitext to determine when a link was added is not feasible with the public database replicas. Could someone please assist me in obtaining this data? Perhaps there is an alternative method, such as analyzing database dumps. Any help would be greatly appreciated. Thank you! ~2025-34883-99 (talk) 16:04, 19 November 2025 (UTC)

This is the same problem as recently asked at Wikipedia:Village pump (technical)/Archive 225#Find articles that have recently had a specific word added?. —Cryptic 18:30, 19 November 2025 (UTC)

Querying "Number of page watchers who visited in the last 30 days" on a list of pages

I don't think this is doable, but I figured I would ask anyway, just in case. In the page information for each page, there is a value shown for "Number of page watchers who visited in the last 30 days". Can this looked up by page in a database query somehow? I believe I can look up the number of watchers, but not this value, unless I'm missing something. Stefen 𝕋ower's got the power!!1! GabGruntwerk 07:21, 24 November 2025 (UTC)

You can't do either. They're queries against the watchlist table, which isn't in the public replicas. includes/Watchlist/WatchedItemStore.php, look for countWatchers() and countVisitingWatchers(). —Cryptic 15:33, 24 November 2025 (UTC)

Single-article users

This query would be for articles created by accounts with no other edits. In other words, the only edits were to a single article, which was created by said account, and there would be a list of such pages. RanDom 404 (talk) 17:24, 25 November 2025 (UTC)

This turns out to be somewhat difficult and very slow due to the way the indexes are set up. So before I spend a couple hours on it, some clarification please:
  • By "single article", you mean a page currently in the main namespace, correct? There's going to be a lot of users who've created a single user page, draft, or talk page, and edited no other page.
  • By "no other edits", do you mean no edits to other articles, or no edits to any other page at all?
  • How about deleted edits? If a user has fifty deleted edits to Deleted Subject A, but all their non-deleted edits are to Existing Subject B which they originally created, should they still be listed? (They probably will even if you answer no; the query's going to be slow enough already, and it'd be prohibitively difficult to correctly match deleted edits to Draft:Deleted Subject C whose later recreation by this user was moved to Existing Mainspace Title D.)
Cryptic 04:38, 27 November 2025 (UTC)
1. Yes, just in the mainspace
2. No other edits to any page at all, regardless of namespace. I imagine this could narrow results.
I don't know how many cases of the last one there are, and since you say it is required to run it that way, the query can be ran counting articles where the user had deleted edits. "Articles created by users with no other edits" I guess (except to that article). RanDom 404 (talk) 16:06, 27 November 2025 (UTC)
Split up between quarry:query/99361, 99362, 99363, and 99365. Allowing edits to other non-mainspace pages gave about half again as many results in my tests with articles starting with 'X'. —Cryptic 17:55, 27 November 2025 (UTC)
And there are a total of 139112 results when adding them together. By the way, would an additional query be possible? It would be the same, but with the additional condition: in Category:All articles lacking sources. There wouldn't be as many results to run, since there are only about 40,000 articles in the category. RanDom 404 (talk) 18:00, 27 November 2025 (UTC)
quarry:query/99370. —Cryptic 18:53, 27 November 2025 (UTC)
These should take pretty quick so might as well ask here. What about one for Category:All orphaned articles, and one for articles in both Category:All orphaned articles and Category:All articles lacking sources? RanDom 404 (talk) 17:26, 28 November 2025 (UTC)
quarry:query/99444 and 99445 respectively. —Cryptic