I just witnessed LLM (ab)use coming from one graduate student (not the first to do it and definitely not the last), where they submitted a conference paper draft for their coauthors and advisors to review short before a deadline, with completely regurgitated material plus hallucinations backed up by multiple non-existent citations.
The problem is every coauthor wants to increase submissions, LLMs are great at making something that looks OK at first glance, and people have low(er) expectations for a conference paper. A recipe for disaster.
Extrapolate a bit and there are LLM written papers being peer reviewed by LLMs, but fear it not, even if they are accepted the will not be cited because LLMs are hallucinating citations that better support their arguments! And then there is the poor researcher, just a beginner, writing a draft, simple but honest material getting lost in all this line noise, or worse out, feeding it.
Anecdotally: I tried hook and crook to get the best flagship model at the time (Opus) to help with technical writing for a submission.
First, these models are not good at technical writing at all. They have no sense of the weight of a single sentence, they just love to blather.
Second, they can't keep the core technical story consistent throughout their completions. In other words, they can't "keep the main thing the main thing".
I had an early draft with AI writing, but by the time we submitted our work -- there was not a single piece of AI writing in the paper. And not without trying, I really did some iterations on trying to carefully craft context, give them a sense of the world model in which they needed to evaluate their additions, yada yada.
For clear and concise technical communication, it's a waste of time right now.
I'm so happy I have pre-LLM publications and blog posts to prove that my blathering isn't because I'm lazy and used Claude, it's just how I write (i.e., badly).
It would be v. funny if I got that wrong, but I do feel the need to point out that "badly" is indeed grammatically correct here because this is HN and pedantry is always on topic.
People over-correct and feel like they can't use "badly" because there is "feeling badly" discourse [0], but that pertains to "feeling" being a linking verb. "Write" is just your bog standard verb for which "badly", an adverb, is a totally valid modifier.
This is just a by the by, but in British English "feeling poorly" mostly means that you are ill. Amusingly it's become slightly euphemistic, so if someone is "a bit poorly" they probably have sniffles or a minor fever. If they are "very poorly" then you probably heard it from a hospital and they're just about dead.
Thus "I feel badly" ... "ok, what did you do?" vs. "I feel poorly" ... "ok, I'll get a bucket."
It would be a disproportionate blow to researchers in countries with less resources and/or more bureaucratized systems (which e.g. demand to see a "result" if you have paid a fee), who just wouldn't submit.
Go for the jugular. Impact the career of people putting out substandard papers.
Come up with a score for "citation strength" or something.
Any given bad actor with too many substandard papers to his/her credit begins to negatively impact the "citation strength" of any paper on which they are a co-author. Maybe even negatively impacting the "citation strength" of papers that even cite papers authored or co-authored by the bad actor in question?
If, say, the major journals had a system like this in place, you'd see everyone perk up and get a whole lot less careless.
It doesn't address the core issue. It's credential inflation.
Not sure that enough people understand that the vast vast majority of research papers are written in order to fulfil criteria to graduate with a PhD. It's all PhD students getting through their program. That's the bulk of the literature.
There was a time when nobody went to school. Then everyone did 4 years elementary to learn reading, writing and basic arithmetic. Then everyone did 8 years, which included more general knowledge. Then it became the default to do 12 years to get to the high school diploma. Then it became default to do a bachelor's to get even simple office jobs. Then it's a masters. Then to actually stand out now in a way that a BSc or MSc made you stand out, you need a PhD. PhD programmes are ballooning. Just as the undergrad model had to change quite a bit when it went from 30 highly-motivated nerds starting CS in a year vs. 1000. These are massive systems, the tens or hundreds of thousands of PhD students must somehow be pushed through this system like clockwork. Just for one conference you get tens of thousands of authors submitting similar amount of papers and tens of thousands of reviewers.
You can't simply halt such a huge machine with a few little clever ideas.
Actually, the problem is pricing. If we could identify and correctly value new concepts, then we can dispense with citations and just use the correct sum of concept valuations. Perhaps a correctly designed futures market would not only solve getting the right PhD students the right jobs, but bring a lot of speculative capital into fundamental research?
That's a very economics-minded approach. Also, I'm not quite sure what the futures would be about. That a paper will... get N citations? get a job for the first author? Achieve N stars on GitHub? N likes on social media? Be patented and put in a product? Turn X USD in profit? Bet on retraction? Bet on acceptance? On awards? Or replicability?
The first question is what scientific research is actually for. Is it merely for profitable technological applications? The Greek or the humanistic or the enlightenment ideal wasn't just that. Fundamental research can be its own endeavor, simply to understand more clearly something. We don't only do astronomy for example in order to build some better contraption and understanding evolution wasn't only about producing better medicine. But it's much harder to quantify elegance or aesthetics of an idea and its impact.
And if you say that this should only be a small segment, and most of it should be tech-optimization, I can accept that, but currently science runs also on this kind of aesthetic idealist prestige. In the overall epistemic economy of society, science fills a certain role. It's distinct from "mere" engineering. The Ph in PhD stands for philosophy.
There's two main problems. First, the field is too saturated -- too many people publishing papers. Second, the reviewers are the same people that publish papers -- it's a zero sum game (regardless of acceptance criteria).
Submission numbers this year have been absolutely crazy. I honestly don't think it can be solved.
The exponential scaling messes with the previous more honor-based gentleman-like reputation-based everyone-knows-everyone situation. That model has its own problems but different. Today it's more cutthroat, grind-competitive and nobody has proper incentives.
It's like working long years in a family-sized company vs job hopping between megacorps.
The actual science takes the backseat. Nobody has time to just think, you must pump out the next paper and somehow get it through peer review. As a reviewer, you don't get much out of reviewing. It used to be exciting to look at new developments from across the field in ones review stack. Today it's mostly filled with the nth resubmission of something by someone in anxious hurry to just produce something to tick a box. There is no cost to just submitting massive amounts of papers. Anyway, so it's not fun as a reviewer either, you get no reward for it and you take time away from your own research. So people now have to be forced to review in order to submit. These forced reviews do as good a job as you expect. The better case is if they are just disinterested. The worse is if they feel you are a dangerous competitor. Or they only try to assess whether you toiled "hard enough" to deserve the badge of a published paper. Intellectual curiosity etc. have taken the back seat. LLMs just make it all worse.
Nobody is truly incenticized to change this. It's a bit of a tragedy of the commons situation. Just extract as much as you can, and fight like in a war.
It's also like moving from a small village where everyone knows everyone to a big metropolis. People are all just in transit there, they want to take as much as possible before moving on. Wider impacts don't matter. Publish a bunch of stuff then get a well paying job. Who cares that these papers are not quite that scientifically valuable? Nobody reads it anyway. In 6 months it's obsolete either way. But in the meantime it increased the citation count of the PI, the department can put it into their annual report, use the numbers for rankings and for applying for state funding, the numbers look good when talking to the minister of education, it can be also pushed in press releases to do PR for the university which increases public reputation etc. The conferences rise on the impact tables because of the immense cross-citation numbers etc. The more papers, the more citations, the higher the impact factor. And this prestige then moves on to the editors, area chairs etc. and it looks good on a CV.
It mirrors a lot of other social developments where time horizons have shrunk, trust is lower, incentives are perverse and nobody quite likes how it is but nobody has unilateral power to change things in the face of institutional inertia.
Organizing activity at such scale is a hard problem. Especially because research is very decentralized by tradition. It's largely independent groups of 10-20 people centered around one main senior scientist. The network between these is informal. It's very different than megacorps. Megacorps can go sclerotic with admin bloat and paralyzed by middle manager layers. But in the distributed model, there is minimal coordination, it's an undifferentiated soup of these tiny groups, each holding on to their similar ideas and rushing to publish first.
Unfortunately, research is not like factory production, even if bureaucrats and bean counters would wish so. Simply throwing more people at it can make negative impact, analogous to the mythical man-month.
And yet, submission numbers follow the (N = actually new papers)/(p = rate of acceptance) trend, approximately. The difference between a 20% and 35% acceptance rate is ~5N vs ~3N papers in the submission pool.
As pointed out in the comments, conferences have physical limits on the number of presenters (even in poster sessions), so the number of accepted papers at top conferences will likely stop growing at some point. (Perhaps it should have been capped already.) This will probably lead to new conferences appearing to satisfy the "demand", but more conferences of varying quality is probably better than reviewing at top conferences being random at best.
Im on an academic foray to an R1 right now. One of the big items the PIs are drilling is the one paper per year mantra.
Many papers today are representing what would have happened via open source repos in yesteryear. Meaning that there is a lot of work which is useful to someone and having peer reviewed benchmarks etc. is useful to understand whether those people should care. The weakness is that some of this work is the equivalent of shovelware.
I really think the conference model is super detrimental to science. It's not like journals are perfect either, but revise and resubmit and desk rejects are a much better filter than continually resubmitting to the same few conferences over and over again. Not to mention that peer review in conferences is probably much lower quality than what you get in most journals (this is my impression anyhow, I don't know how one could quantify such a thing).
I'm in CS and I submit both to conferences and journals (the former because it's what people actually read, the latter because of evaluation requirements in my country). And I can tell you that (IMO of course) the conference model is immensely better, and idealization of journals in the CS community is a clear case of "grass is always greener".
Revise and resubmit is evil. It gives the reviewers a lot of power over papers that ends up being used for coertion, sometimes subtle, sometimes quite overt. In most papers I have submitted to journals (and I'm talking prestigious journals, not MDPI or the likes), I have been pressured to cite specific papers that didn't make sense to cite, very likely from the reviewers themselves. And one ends up doing it, because not doing it can result in rejection and losing many months (the journal process is also slower), maybe the paper even becoming obsolete along the way. Of course, the "revise and resubmit" process can also be used to pressure authors into changing papers in subtler ways (to not question a given theory, etc.)
The slowness of the process also means that if you're unlucky with the reviewers, you lose much more time. There is a fact that we should all accept: the reviewing process always carries a huge random factor due to subjectivity. And being able to "reroll" reviewers is actually a good thing. It means that a paper that a good proportion of the community values highly will eventually get in, as opposed to being doomed because the initial very small sample (n=3) is from a rejecting minority.
Finally, in my experience reviewing quality is the other way around... there is a small minority of journals with good review quality but the majority (including prestigious ones) it's a crapshoot, not to mention when the editor desk rejects for highly subjective reasons. In the conferences I typically submit to (*ACL) the review quality is more consistent than in journals, and the process is more serious with rejects always being motivated.
I agree there's tons of problems with journals as well, I think an entirely different system could probably be better. Even preprints with some sort of public facing moderated comments could be more effective.
However, I think this notion of a paper becoming "obsolete" if it isn't published fast enough speaks to the deeper problems in ML publishing; it's fundamentally about publicizing and explaining a cool technique rather than necessarily reaching some kind of scientific understanding.
>In the conferences I typically submit to (*ACL) the review quality is more consistent than in journals
I got to say, my experience is very different. I come from linguistics and submit to both *ACL as well as linguistics/cognition journals and I think journals are generally better. One of my reviews for ACL was essentially "Looks great, learnt a lot!" (I'm paraphrasing but it was about 3 sentences long, I'm happy for a positive review but it was hardly high quality).
Even in *ACL I find TACL better than what I've gotten for the ACL conferences. I just find with a slow review process a reviewer can actually evaluate claims more closely rather than review in a pretty impressionistic way.
That being said, there are plenty of journals with awful reviewing and editorial boards (cough, cough Nature).
The post suggests why review quality suffers. Because of the system, there is too much reviewing going on. People get tired and produce worse reviews. Those receiving these low-quality reviews become less motivated, and in turn put less effort into reviewing as well. Bad reviewing makes the system less predictable, so you have to spray and pray with as many papers as possible if you want to keep up with publication expectations. This adds even more papers into the system, making it worse.
There are just too many negative eventualities reinforcing each other in different ways.
not sure how many conferences you have been to, but (1) abstracts are filtered, since conferences probably get 100s of abstracts for every slot they have available, some of which end up being converted to workshops or breakout panel discussions to accommodate interesting topics, and (2) I've seen presentations where "lively" discussions have happened between the presenter(s) and the audience.
The real issue is careers valuing where papers are published over what they contribute, and incentives won’t change until hiring and funding reward depth over volume.
> valuing where papers are published over what they contribute
And who is the arbiter of that? This is an imperfect but easy shorthand. Like valuing grades and degrees instead of what people actually took away from school.
In an ideal world we would see all this intangible worth in people's contributions. But we don't have time for that.
So the PhD committee decides on exactly that measure whether there are enough published articles for a cumulative dissertation and if that's enough. What's exactly the alternative? Calling in fresh reviews to weigh the contributions?
Avoiding the problem altogether is just throwing up your hands and saying "this is too hard so I'm not going to even try".
We already know there is some way to do it because researchers do salami slicing where they take one paper and split it up into multiple papers to get more numbers, out of the same work. Therefore one might for example look at a paper and think, how many papers could one get out of this if they were to take part in salami slicing in order to get at-least some measure of this initially.
Depends on who is doing the "careers valuing" and how closely they're looking. At a coarse level, especially for jobs in industry, venue is a pretty simple (but obviously imperfect) indicator for quality. If you've managed to publish one or more papers at the most selective venues (esp. as main author), then I would assume there's a decent chance you are good at research, even if I don't know anything about the subfield you work on. As a further indicator, the number of citations is also a noisy but easy to check proxy for "impact".
But for academic or other high-level research jobs, whoever is doing the valuing is going to look at a lot more than just the venue.
> But for academic or other high-level research jobs, whoever is doing the valuing is going to look at a lot more than just the venue.
Depends on where. In some countries (e.g. mine, Spain), the notion that evaluation should be "objetive" leads to it degenerating into a pure bean-counting exercise: a first-quartile JCR indexed journal paper is worth 10 points, a top-tier (according to a specific ranking) conference paper is worth 8 points, etc. In some calls/contexts there is some leeway for evaluators to actually look at the content and e.g. subtract points for salami slicing or for publishing in journals that are known to be crap in spite of good quartile, but in others it's not even allowed to do that (you would face an appeal for not following the official scoring scale).
This has truth but is not the full story. It gets your foot in the door, but I think junior researcher overinflate the importance of numbers. At the end of the day when you interview, your future boss will ask about your research, and it doesn't matter all that much how many papers they are and how many are top published. Yes you should have some top conf papers, but one great Arxiv paper that is actually used by others can have more value than 4 meh papers that slipped through the reviews.
Remember that nobody is a passive actor in the system. Everyone sees the state of conferences and review randomness and the gaming of the system. Senior researchers are area chairs and program chairs. They are well aware and won't take paper counts as the only signal.
Again, papers are needed, but it's really not the only thing.
It is difficult to judge the impact for most papers in the first few years. Additionally, impact is influenced by visibility, and when there’s a deluge of papers, guess what’s a really good way to make your paper stand out? Yup, publication in a good venue.
If there's more good papers due to more activity, doesn't that allow for the possibility of more conferences and journals to publish to, which increases the acceptance rate?
I don't think that the solution has to be that existing conferences and journals accept more.
The problem is that external environment (namely hiring) creates pressure to publish in a handful of prestigious venues. Nobody wants to be the odd one out that has their papers in some less well known conferences.
There could be more top-tier conferences, though. If the field grows, it makes perfect sense for the number of top-tier venues to grow as well (for whatever percentile we set as a threshold to be "top-tier"). And creating more venues is also good for people who might not have funding for long trips across continents (although lately, accommodation costs tend to be increasingly dominant with respect to flight costs in most destinations I'm traveling to).
In general you're right, but I believe that there are ways to do it. For example, if an existing top tier conference split into several (e.g. say that NeurIPS decides to hold 3 conferences a year, NeurIPS-America, NeurIPS-Europe and NeurIPS-Asia, or whatever) in practice they would probably be top tier from the get go.
Many research organizations use formal conference/journal rankings. These are usually calculated by following h-index values for papers published there. A new conference would start as unclassified for a few years and could not be used when you need to hit some formal criteria in academia.
I agree that a) rejecting a paper that has been recommended for acceptance by _all_ reviewers (something that routinely happens in, say, NeurIPS) is nonsense. However, in-person conferences have physical limits. In the case of, again, NeurIPS, you may get accepted and _not_ present the paper to an audience. This is also a bit of a travesty.
The community would be better off working with established journals so that they take reviews from A* conferences as an informal first round, giving authors a clear path to publication. Even though top conferences will always have their appeal, the writing is on the wall that this model is unsustainable.
NeurIPS scoring system is inherently subjective. People will have wildly different interpretations of, say, 3 vs 4, or 4 vs 5. You can get lucky and draw only reviewers that, on average, "overrate" papers in their batch. The opposite can happen too, obviously. 4444 vs 3444 is just noise.
My general experience as a PhD student makes me feel that we probably should look at the system as a whole. I get the feeling most universities are abusing publications as a way of assessing their own students. It means people will tend to publish slop for the sake of publication instead of genuinely tackling hard problems. I personally have seen my university get away with forming defence committees with 0 knowledge of the field they're assessing. If the university can't assess its students work then they should not be teaching.
I think both conferences and journals are broken in this regard. It doesn't help that professors primary jobs these days is to be a social media influencer and attract funding. How the funding is used doesn't seem to matter or impact their careers. What we need is more accountability from senior researchers. They should be at the very least assessing their own students work before stamping their name on the work.
On the flip side it isn't untrue that there are major breakthroughs happening daily at this point in many fields. We just don't have the bandwidth to handle all the information overload.
I used to be in (molecular biology) research. At some point my supervisors were already working towards a paper in their mind, while I was still doubting (the statistical significance of) my findings.
In the end I left my Phd track before actually finishing it. My conclusion is that I like research(ing stuff) as a verb, but I don't like research as an institute.
So research is also now a race to the bottom? Come join me at my private research coop, where we emphasize quality over quantity and don't give a shit about whats perceived as cool...
I just witnessed LLM (ab)use coming from one graduate student (not the first to do it and definitely not the last), where they submitted a conference paper draft for their coauthors and advisors to review short before a deadline, with completely regurgitated material plus hallucinations backed up by multiple non-existent citations.
The problem is every coauthor wants to increase submissions, LLMs are great at making something that looks OK at first glance, and people have low(er) expectations for a conference paper. A recipe for disaster.
Extrapolate a bit and there are LLM written papers being peer reviewed by LLMs, but fear it not, even if they are accepted the will not be cited because LLMs are hallucinating citations that better support their arguments! And then there is the poor researcher, just a beginner, writing a draft, simple but honest material getting lost in all this line noise, or worse out, feeding it.
Anecdotally: I tried hook and crook to get the best flagship model at the time (Opus) to help with technical writing for a submission.
First, these models are not good at technical writing at all. They have no sense of the weight of a single sentence, they just love to blather.
Second, they can't keep the core technical story consistent throughout their completions. In other words, they can't "keep the main thing the main thing".
I had an early draft with AI writing, but by the time we submitted our work -- there was not a single piece of AI writing in the paper. And not without trying, I really did some iterations on trying to carefully craft context, give them a sense of the world model in which they needed to evaluate their additions, yada yada.
For clear and concise technical communication, it's a waste of time right now.
I'm so happy I have pre-LLM publications and blog posts to prove that my blathering isn't because I'm lazy and used Claude, it's just how I write (i.e., badly).
poorly…
It would be v. funny if I got that wrong, but I do feel the need to point out that "badly" is indeed grammatically correct here because this is HN and pedantry is always on topic.
People over-correct and feel like they can't use "badly" because there is "feeling badly" discourse [0], but that pertains to "feeling" being a linking verb. "Write" is just your bog standard verb for which "badly", an adverb, is a totally valid modifier.
[0] https://www.merriam-webster.com/grammar/do-you-feel-bad-or-f...
This is just a by the by, but in British English "feeling poorly" mostly means that you are ill. Amusingly it's become slightly euphemistic, so if someone is "a bit poorly" they probably have sniffles or a minor fever. If they are "very poorly" then you probably heard it from a hospital and they're just about dead.
Thus "I feel badly" ... "ok, what did you do?" vs. "I feel poorly" ... "ok, I'll get a bucket."
Honest question: why not charge a fee per submission or per review?
Or if the problem is bad papers, a fee that is returned unless it’s a universal strong reject.
Or if you don’t want to miss the best papers, a fee only for resubmitted papers?
Or a fee that is returned if your paper is strong accept?
Or a fee that is returned if your paper is accepted.
There’s some model that has to be fair (not a financial burden to those writing good papers) and will limit the rate of submissions.
Thoughts?
It would be a disproportionate blow to researchers in countries with less resources and/or more bureaucratized systems (which e.g. demand to see a "result" if you have paid a fee), who just wouldn't submit.
The people using AI are already paying to OpenAI or whoever to create those fake papers.
Fees mean very little.
Go for the jugular. Impact the career of people putting out substandard papers.
Come up with a score for "citation strength" or something.
Any given bad actor with too many substandard papers to his/her credit begins to negatively impact the "citation strength" of any paper on which they are a co-author. Maybe even negatively impacting the "citation strength" of papers that even cite papers authored or co-authored by the bad actor in question?
If, say, the major journals had a system like this in place, you'd see everyone perk up and get a whole lot less careless.
It doesn't address the core issue. It's credential inflation.
Not sure that enough people understand that the vast vast majority of research papers are written in order to fulfil criteria to graduate with a PhD. It's all PhD students getting through their program. That's the bulk of the literature.
There was a time when nobody went to school. Then everyone did 4 years elementary to learn reading, writing and basic arithmetic. Then everyone did 8 years, which included more general knowledge. Then it became the default to do 12 years to get to the high school diploma. Then it became default to do a bachelor's to get even simple office jobs. Then it's a masters. Then to actually stand out now in a way that a BSc or MSc made you stand out, you need a PhD. PhD programmes are ballooning. Just as the undergrad model had to change quite a bit when it went from 30 highly-motivated nerds starting CS in a year vs. 1000. These are massive systems, the tens or hundreds of thousands of PhD students must somehow be pushed through this system like clockwork. Just for one conference you get tens of thousands of authors submitting similar amount of papers and tens of thousands of reviewers.
You can't simply halt such a huge machine with a few little clever ideas.
Actually, the problem is pricing. If we could identify and correctly value new concepts, then we can dispense with citations and just use the correct sum of concept valuations. Perhaps a correctly designed futures market would not only solve getting the right PhD students the right jobs, but bring a lot of speculative capital into fundamental research?
That's a very economics-minded approach. Also, I'm not quite sure what the futures would be about. That a paper will... get N citations? get a job for the first author? Achieve N stars on GitHub? N likes on social media? Be patented and put in a product? Turn X USD in profit? Bet on retraction? Bet on acceptance? On awards? Or replicability?
The first question is what scientific research is actually for. Is it merely for profitable technological applications? The Greek or the humanistic or the enlightenment ideal wasn't just that. Fundamental research can be its own endeavor, simply to understand more clearly something. We don't only do astronomy for example in order to build some better contraption and understanding evolution wasn't only about producing better medicine. But it's much harder to quantify elegance or aesthetics of an idea and its impact.
And if you say that this should only be a small segment, and most of it should be tech-optimization, I can accept that, but currently science runs also on this kind of aesthetic idealist prestige. In the overall epistemic economy of society, science fills a certain role. It's distinct from "mere" engineering. The Ph in PhD stands for philosophy.
I think that's basically Impact Factor / Page Rank....
There's two main problems. First, the field is too saturated -- too many people publishing papers. Second, the reviewers are the same people that publish papers -- it's a zero sum game (regardless of acceptance criteria).
Submission numbers this year have been absolutely crazy. I honestly don't think it can be solved.
The exponential scaling messes with the previous more honor-based gentleman-like reputation-based everyone-knows-everyone situation. That model has its own problems but different. Today it's more cutthroat, grind-competitive and nobody has proper incentives.
It's like working long years in a family-sized company vs job hopping between megacorps.
The actual science takes the backseat. Nobody has time to just think, you must pump out the next paper and somehow get it through peer review. As a reviewer, you don't get much out of reviewing. It used to be exciting to look at new developments from across the field in ones review stack. Today it's mostly filled with the nth resubmission of something by someone in anxious hurry to just produce something to tick a box. There is no cost to just submitting massive amounts of papers. Anyway, so it's not fun as a reviewer either, you get no reward for it and you take time away from your own research. So people now have to be forced to review in order to submit. These forced reviews do as good a job as you expect. The better case is if they are just disinterested. The worse is if they feel you are a dangerous competitor. Or they only try to assess whether you toiled "hard enough" to deserve the badge of a published paper. Intellectual curiosity etc. have taken the back seat. LLMs just make it all worse.
Nobody is truly incenticized to change this. It's a bit of a tragedy of the commons situation. Just extract as much as you can, and fight like in a war.
It's also like moving from a small village where everyone knows everyone to a big metropolis. People are all just in transit there, they want to take as much as possible before moving on. Wider impacts don't matter. Publish a bunch of stuff then get a well paying job. Who cares that these papers are not quite that scientifically valuable? Nobody reads it anyway. In 6 months it's obsolete either way. But in the meantime it increased the citation count of the PI, the department can put it into their annual report, use the numbers for rankings and for applying for state funding, the numbers look good when talking to the minister of education, it can be also pushed in press releases to do PR for the university which increases public reputation etc. The conferences rise on the impact tables because of the immense cross-citation numbers etc. The more papers, the more citations, the higher the impact factor. And this prestige then moves on to the editors, area chairs etc. and it looks good on a CV.
It mirrors a lot of other social developments where time horizons have shrunk, trust is lower, incentives are perverse and nobody quite likes how it is but nobody has unilateral power to change things in the face of institutional inertia.
Organizing activity at such scale is a hard problem. Especially because research is very decentralized by tradition. It's largely independent groups of 10-20 people centered around one main senior scientist. The network between these is informal. It's very different than megacorps. Megacorps can go sclerotic with admin bloat and paralyzed by middle manager layers. But in the distributed model, there is minimal coordination, it's an undifferentiated soup of these tiny groups, each holding on to their similar ideas and rushing to publish first.
Unfortunately, research is not like factory production, even if bureaucrats and bean counters would wish so. Simply throwing more people at it can make negative impact, analogous to the mythical man-month.
And yet, submission numbers follow the (N = actually new papers)/(p = rate of acceptance) trend, approximately. The difference between a 20% and 35% acceptance rate is ~5N vs ~3N papers in the submission pool.
As pointed out in the comments, conferences have physical limits on the number of presenters (even in poster sessions), so the number of accepted papers at top conferences will likely stop growing at some point. (Perhaps it should have been capped already.) This will probably lead to new conferences appearing to satisfy the "demand", but more conferences of varying quality is probably better than reviewing at top conferences being random at best.
Im on an academic foray to an R1 right now. One of the big items the PIs are drilling is the one paper per year mantra.
Many papers today are representing what would have happened via open source repos in yesteryear. Meaning that there is a lot of work which is useful to someone and having peer reviewed benchmarks etc. is useful to understand whether those people should care. The weakness is that some of this work is the equivalent of shovelware.
Creating a new, first-grade conference from scratch is hard.
NeuriPS, CVPR and ICML are solid brands that took decades to build.
ICLR managed it quite fast.
I really think the conference model is super detrimental to science. It's not like journals are perfect either, but revise and resubmit and desk rejects are a much better filter than continually resubmitting to the same few conferences over and over again. Not to mention that peer review in conferences is probably much lower quality than what you get in most journals (this is my impression anyhow, I don't know how one could quantify such a thing).
I'm in CS and I submit both to conferences and journals (the former because it's what people actually read, the latter because of evaluation requirements in my country). And I can tell you that (IMO of course) the conference model is immensely better, and idealization of journals in the CS community is a clear case of "grass is always greener".
Revise and resubmit is evil. It gives the reviewers a lot of power over papers that ends up being used for coertion, sometimes subtle, sometimes quite overt. In most papers I have submitted to journals (and I'm talking prestigious journals, not MDPI or the likes), I have been pressured to cite specific papers that didn't make sense to cite, very likely from the reviewers themselves. And one ends up doing it, because not doing it can result in rejection and losing many months (the journal process is also slower), maybe the paper even becoming obsolete along the way. Of course, the "revise and resubmit" process can also be used to pressure authors into changing papers in subtler ways (to not question a given theory, etc.)
The slowness of the process also means that if you're unlucky with the reviewers, you lose much more time. There is a fact that we should all accept: the reviewing process always carries a huge random factor due to subjectivity. And being able to "reroll" reviewers is actually a good thing. It means that a paper that a good proportion of the community values highly will eventually get in, as opposed to being doomed because the initial very small sample (n=3) is from a rejecting minority.
Finally, in my experience reviewing quality is the other way around... there is a small minority of journals with good review quality but the majority (including prestigious ones) it's a crapshoot, not to mention when the editor desk rejects for highly subjective reasons. In the conferences I typically submit to (*ACL) the review quality is more consistent than in journals, and the process is more serious with rejects always being motivated.
I agree there's tons of problems with journals as well, I think an entirely different system could probably be better. Even preprints with some sort of public facing moderated comments could be more effective.
However, I think this notion of a paper becoming "obsolete" if it isn't published fast enough speaks to the deeper problems in ML publishing; it's fundamentally about publicizing and explaining a cool technique rather than necessarily reaching some kind of scientific understanding.
>In the conferences I typically submit to (*ACL) the review quality is more consistent than in journals
I got to say, my experience is very different. I come from linguistics and submit to both *ACL as well as linguistics/cognition journals and I think journals are generally better. One of my reviews for ACL was essentially "Looks great, learnt a lot!" (I'm paraphrasing but it was about 3 sentences long, I'm happy for a positive review but it was hardly high quality).
Even in *ACL I find TACL better than what I've gotten for the ACL conferences. I just find with a slow review process a reviewer can actually evaluate claims more closely rather than review in a pretty impressionistic way.
That being said, there are plenty of journals with awful reviewing and editorial boards (cough, cough Nature).
The post suggests why review quality suffers. Because of the system, there is too much reviewing going on. People get tired and produce worse reviews. Those receiving these low-quality reviews become less motivated, and in turn put less effort into reviewing as well. Bad reviewing makes the system less predictable, so you have to spray and pray with as many papers as possible if you want to keep up with publication expectations. This adds even more papers into the system, making it worse.
There are just too many negative eventualities reinforcing each other in different ways.
not sure how many conferences you have been to, but (1) abstracts are filtered, since conferences probably get 100s of abstracts for every slot they have available, some of which end up being converted to workshops or breakout panel discussions to accommodate interesting topics, and (2) I've seen presentations where "lively" discussions have happened between the presenter(s) and the audience.
I was thinking mostly with my experiences with *ACL conferences, it may be different elsewhere.
The real issue is careers valuing where papers are published over what they contribute, and incentives won’t change until hiring and funding reward depth over volume.
> valuing where papers are published over what they contribute
And who is the arbiter of that? This is an imperfect but easy shorthand. Like valuing grades and degrees instead of what people actually took away from school.
In an ideal world we would see all this intangible worth in people's contributions. But we don't have time for that.
So the PhD committee decides on exactly that measure whether there are enough published articles for a cumulative dissertation and if that's enough. What's exactly the alternative? Calling in fresh reviews to weigh the contributions?
Avoiding the problem altogether is just throwing up your hands and saying "this is too hard so I'm not going to even try".
We already know there is some way to do it because researchers do salami slicing where they take one paper and split it up into multiple papers to get more numbers, out of the same work. Therefore one might for example look at a paper and think, how many papers could one get out of this if they were to take part in salami slicing in order to get at-least some measure of this initially.
Depends on who is doing the "careers valuing" and how closely they're looking. At a coarse level, especially for jobs in industry, venue is a pretty simple (but obviously imperfect) indicator for quality. If you've managed to publish one or more papers at the most selective venues (esp. as main author), then I would assume there's a decent chance you are good at research, even if I don't know anything about the subfield you work on. As a further indicator, the number of citations is also a noisy but easy to check proxy for "impact".
But for academic or other high-level research jobs, whoever is doing the valuing is going to look at a lot more than just the venue.
> But for academic or other high-level research jobs, whoever is doing the valuing is going to look at a lot more than just the venue.
Depends on where. In some countries (e.g. mine, Spain), the notion that evaluation should be "objetive" leads to it degenerating into a pure bean-counting exercise: a first-quartile JCR indexed journal paper is worth 10 points, a top-tier (according to a specific ranking) conference paper is worth 8 points, etc. In some calls/contexts there is some leeway for evaluators to actually look at the content and e.g. subtract points for salami slicing or for publishing in journals that are known to be crap in spite of good quartile, but in others it's not even allowed to do that (you would face an appeal for not following the official scoring scale).
This has truth but is not the full story. It gets your foot in the door, but I think junior researcher overinflate the importance of numbers. At the end of the day when you interview, your future boss will ask about your research, and it doesn't matter all that much how many papers they are and how many are top published. Yes you should have some top conf papers, but one great Arxiv paper that is actually used by others can have more value than 4 meh papers that slipped through the reviews.
Remember that nobody is a passive actor in the system. Everyone sees the state of conferences and review randomness and the gaming of the system. Senior researchers are area chairs and program chairs. They are well aware and won't take paper counts as the only signal.
Again, papers are needed, but it's really not the only thing.
It is difficult to judge the impact for most papers in the first few years. Additionally, impact is influenced by visibility, and when there’s a deluge of papers, guess what’s a really good way to make your paper stand out? Yup, publication in a good venue.
If there's more good papers due to more activity, doesn't that allow for the possibility of more conferences and journals to publish to, which increases the acceptance rate?
I don't think that the solution has to be that existing conferences and journals accept more.
The problem is that external environment (namely hiring) creates pressure to publish in a handful of prestigious venues. Nobody wants to be the odd one out that has their papers in some less well known conferences.
There could be more top-tier conferences, though. If the field grows, it makes perfect sense for the number of top-tier venues to grow as well (for whatever percentile we set as a threshold to be "top-tier"). And creating more venues is also good for people who might not have funding for long trips across continents (although lately, accommodation costs tend to be increasingly dominant with respect to flight costs in most destinations I'm traveling to).
This has to be organic and is not guaranteed to happen. You can't just arbitrarily declare a conference "top tier".
In general you're right, but I believe that there are ways to do it. For example, if an existing top tier conference split into several (e.g. say that NeurIPS decides to hold 3 conferences a year, NeurIPS-America, NeurIPS-Europe and NeurIPS-Asia, or whatever) in practice they would probably be top tier from the get go.
Many research organizations use formal conference/journal rankings. These are usually calculated by following h-index values for papers published there. A new conference would start as unclassified for a few years and could not be used when you need to hit some formal criteria in academia.
I agree that a) rejecting a paper that has been recommended for acceptance by _all_ reviewers (something that routinely happens in, say, NeurIPS) is nonsense. However, in-person conferences have physical limits. In the case of, again, NeurIPS, you may get accepted and _not_ present the paper to an audience. This is also a bit of a travesty.
The community would be better off working with established journals so that they take reviews from A* conferences as an informal first round, giving authors a clear path to publication. Even though top conferences will always have their appeal, the writing is on the wall that this model is unsustainable.
NeurIPS scoring system is inherently subjective. People will have wildly different interpretations of, say, 3 vs 4, or 4 vs 5. You can get lucky and draw only reviewers that, on average, "overrate" papers in their batch. The opposite can happen too, obviously. 4444 vs 3444 is just noise.
Papers tend to roll down hill. Failure to accept by the big conferences pushes them to lower tier venues.
My general experience as a PhD student makes me feel that we probably should look at the system as a whole. I get the feeling most universities are abusing publications as a way of assessing their own students. It means people will tend to publish slop for the sake of publication instead of genuinely tackling hard problems. I personally have seen my university get away with forming defence committees with 0 knowledge of the field they're assessing. If the university can't assess its students work then they should not be teaching.
I think both conferences and journals are broken in this regard. It doesn't help that professors primary jobs these days is to be a social media influencer and attract funding. How the funding is used doesn't seem to matter or impact their careers. What we need is more accountability from senior researchers. They should be at the very least assessing their own students work before stamping their name on the work.
On the flip side it isn't untrue that there are major breakthroughs happening daily at this point in many fields. We just don't have the bandwidth to handle all the information overload.
I used to be in (molecular biology) research. At some point my supervisors were already working towards a paper in their mind, while I was still doubting (the statistical significance of) my findings.
In the end I left my Phd track before actually finishing it. My conclusion is that I like research(ing stuff) as a verb, but I don't like research as an institute.
So research is also now a race to the bottom? Come join me at my private research coop, where we emphasize quality over quantity and don't give a shit about whats perceived as cool...
It's part of the wider credential inflation trend and increased grind-competition overall.