Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

How should I state my peer review experience in the CV?

I've been invited to review manuscripts for two journals in my field frequently. I noticed in the first couple of cases my supervisor was the associate editor which has selected me in the process, but for rest of them, I was assigned to the papers by other editors (not from our network).

So, I assumed that my name should be there somewhere in a list, such that the editors may choose me when they see a paper matches my keywords or my previous review results, etc.

Now, I want to state that in my CV and in a list separate from my ad-hoc review experiences, but I do not know what suitable key phrase I should use.

For instance, can I use the following?

Member of the review board for Journals X and Y

By the way, I work in the field of computer science.

  • peer-review

Wrzlprmft's user avatar

2 Answers 2

This is the sort of thing which will vary more from field to field, and you may want to talk to other people in your field, but my general inclination is not to do so. Unless you have been specifically told that there is some review board you are on, or been invited to be on, this may not even exist. Aside from the ethics, if this is a prominent journal, and one of the editors sees your CV, they could react extremely negatively. A better solution might be that if you are going to list what journals you have reviewed for is to list how many articles you have reviewed for a journal if you have reviewed more than one.

JoshuaZ's user avatar

  • 5 It is probably enough to just say "a reviewer for ACM SigPlan" or whatever. –  Buffy Commented Mar 16, 2019 at 23:23
  • Referee for reputed international journals –  Alchimista Commented Mar 17, 2019 at 9:42
  • Joshua and @Erwan Why should the editor be upset if I have mentioned that I have reviewed for this journal? Especially that I have spent the effort and time to do the peer reviews without any benefit for me really. Shouldn't it at least boost my reputation somehow by being reflected in my CV? –  Alireza M. Kamelabad Commented Aug 17 at 12:09
  • @AlirezaM.Kamelabad They may see it as jeopardizing referee anonymity. –  JoshuaZ Commented Aug 17 at 16:43
  • 1 @AlirezaM.Kamelabad It may vary from subfield to subfield, but in general editors will ask you to referee things close to your own narrow interest, and when they move outside it, one is more likely to just say no. It doesn't always give away who refereed a paper, but it substantially increases the chance someone will give it away. –  JoshuaZ Commented Aug 17 at 23:41
Can I use for instance, "member of the review board" for journals X and Y?

I agree with Joshua, it's a bit risky to present it this way unless you are sure that this journal has something that they call "review board" and that you are on it (it might not exist or have a different name). Personally I just have a section called "Reviewing activities" with a list of journals/conferences I reviewed for ordered by year.

Erwan's user avatar

You must log in to answer this question.

Not the answer you're looking for browse other questions tagged peer-review journals cv ..

  • Featured on Meta
  • Bringing clarity to status tag usage on meta sites
  • We've made changes to our Terms of Service & Privacy Policy - July 2024
  • Announcing a change to the data-dump process

Hot Network Questions

  • How to determine the solana main instruction?
  • What are the risks of a compromised top tube and of attempts to repair it?
  • Iteration Limit for expression involving Gamma functions
  • Purpose of burn permit?
  • What did the Ancient Greeks think the stars were?
  • pgf plots-Shifting the tick label down while changing the decimal seperator to comma (,)
  • Why does the NIV have differing versions of Romans 3:22?
  • Why is one of the Intel 8042 keyboard controller outputs inverted?
  • I'm trying to remember a novel about an asteroid threatening to destroy the earth. I remember seeing the phrase "SHIVA IS COMING" on the cover
  • How are notes named in Japan?
  • Historical U.S. political party "realignments"?
  • DATEDIFF Rounding
  • Is it advisable to contact faculty members at U.S. universities prior to submitting a PhD application?
  • Reusing own code at work without losing licence
  • Can I use "historically" to mean "for a long time" in "Historically, the Japanese were almost vegetarian"?
  • `Drop` for list of elements of different dimensions
  • Change output language of internal commands like "lpstat"?
  • Passport Carry in Taiwan
  • What issues are there with my perspective on truth?
  • Meaning of “ ’thwart” in a 19th century poem
  • Is there any video of an air-to-air missile shooting down an aircraft?
  • What would be non-slang equivalent of "copium"?
  • Can't see parts of a wall when viewed through a glass panel (shower cabin) from a top view angle
  • How do you determine what order to process chained events/interactions?

ad hoc journal article review

  • Search Menu
  • Sign in through your institution
  • Advance articles
  • Editor's Choice
  • Continuing Education
  • Author Guidelines
  • Submission Site
  • Open Access
  • Why publish with this journal?
  • About Archives of Clinical Neuropsychology
  • About the National Academy of Neuropsychology
  • Journals Career Network
  • Editorial Board
  • Advertising and Corporate Services
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Article Contents

On becoming a peer reviewer for a neuropsychology journal, why become an ad hoc reviewer, how does one become an ad hoc reviewer, what does one do as an ad hoc reviewer, introduction, materials and methods, other tips on reviewing, conflict of interest, acknowledgements.

  • < Previous
  • Article contents
  • Figures & tables
  • Supplementary Data

Kevin Duff, Sid E. O'Bryant, Holly James Westervelt, Jerry J. Sweet, Cecil R. Reynolds, Wilfred G. van Gorp, Daniel Tranel, Robert J. McCaffrey, On Becoming a Peer Reviewer for a Neuropsychology Journal, Archives of Clinical Neuropsychology , Volume 24, Issue 3, May 2009, Pages 201–207, https://doi.org/10.1093/arclin/acp031

  • Permissions Icon Permissions

The peer-review process is an invaluable service provided by the professional community, and it provides the critical foundation for the advancement of science. However, there is remarkably little systematic guidance for individuals who wish to become part of this process. This paper, written from the perspective of reviewers and editors with varying levels of experience, provides general guidelines and advice for new reviewers in neuropsychology, as well as outlining benefits of participation in this process. It is hoped that the current information will encourage individuals at all levels to become involved in peer-reviewing for neuropsychology journals.

The peer-review process of professional journal publishing is as important to the scientific enterprise as developing reliable and valid measures, well-characterized samples, and appropriate statistical techniques. However, many professionals have limited involvement in reviewing manuscripts for scientific publication. Whether one is well grounded by scientific training or not, beginning involvement in the review of journal manuscripts typically occurs with little to no guidance. In addition, empirical evaluation of the peer-review process has in some instances revealed a disappointing level of agreement between peer reviewers (e.g., Rothwell & Martyn, 2000 ). Probable factors in producing low reviewer agreement are the general lack of direction provided to reviewers, who are often left to “figure it out” on their own, as well as a disproportionate number of reviewers who are less experienced and more junior in their careers. The current article will review some benefits of being an ad hoc reviewer for journals, and outline some points to keep in mind while conducting reviews.

There are many reasons to serve as an ad hoc reviewer for scientific journals. It may seem a truism that volunteerism is its own reward, but nevertheless the following is a list of specific benefits that can accrue to reviewers:

Staying current with the literature . By evaluating manuscripts, the reviewer is exposed to the most recent, cutting edge research in the field, even before it is published. As one reads a manuscript, one takes in concise summaries of relevant literature in the Introduction section, new procedures detailed in the Methods section, and the latest findings in the Results section. Although not all submissions will be published, reviewers are privy to data that few others have seen.

Developing your professional diversity . If your work primarily involves conducting clinical assessments or interventions, then reviewing manuscripts might provide an outlet for your “researcher side.” If you primarily conduct basic science research, then reviewing clinically relevant findings might widen the scope of your own work or viewpoint. If you primarily teach, then reviewing might expand your breadth of the basics for what you teach, as well as offer teaching opportunities for your students as co-reviewers (discussed subsequently).

Shaping the field . By commenting on the work of your peers, you can sometimes guide the focus of a particular paper. Your editorial suggestions could improve methodologies and points of view within neuropsychology.

Taking advantage of an opportunity to provide service to the scientific community . At some academic institutions, one way of demonstrating scientific productivity and involvement in your discipline is through service in peer review of manuscripts related to your areas of investigation. In fact, being asked to be a peer reviewer represents recognition of your own work, which has attracted the attention of the journal's editorial board. Frequent involvement in publishing your own work and peer reviewing manuscripts for the same journal can lead to an invitation to serve on the editorial board, which is widely considered to be a distinctive recognition of your own scientific efforts. This is true whether one works in an academic setting or in a private practice setting. Interestingly, clinical neuropsychology is one of the few healthcare specialties in which private practitioners are relatively frequently involved in peer-reviewed journal publishing.

Giving back to the field . Many neuropsychology journals are affiliated with professional organizations. For example, Archives of Clinical Neuropsychology is the official journal of the National Academy of Neuropsychology, The Clinical Neuropsychologist is the official journal of the American Academy of Clinical Neuropsychology, The Journal of the International Neuropsychological Society is the official journal of the International Neuropsychological Society, and Applied Neuropsychology is the official journal of the American College of Professional Neuropsychology. By participating in the review process, you are sharing your individual skills and professional perspective with peers in your specialty, including those who constitute a much broader readership than the membership organization, such as psychologists who are not neuropsychologists and individuals outside our discipline altogether, such as physicians.

Improving the quality of your own work . Just as reading recently published studies provides new avenues for one's own studies, reviewing manuscripts submitted for publication can provide ideas about the methods being used and research questions being investigated by peers with similar interests to one's own. Though reviewers must be careful to avoid intellectual plagiarism, there is potential for learning about new techniques that might apply to your existing line of research.

If you are interested in reviewing manuscripts for a journal, but do not know where to begin, there are many ways to become involved in the process. These fundamental ideas may require patience on your part, but will improve your chances of being given an opportunity.

Directly contact journal editors . A fact known all too well to journal editors is that it can be quite difficult at times to find suitable reviewers who will be able to provide a timely review of a manuscript, in part because the most skilled and experienced reviewers are often the busiest. Editors therefore need a long list of potential reviewers, and often are looking for names to add to that list. This remains true, even in the age of electronic databases to which editors have access. A brief email stating your name, basic qualifications, and areas of interest is usually sufficient. Most editors will be very receptive to anyone expressing interest in reviewing articles and excited at the prospect of another resource!

Talk to your peers . It is likely that some of your colleagues are already involved in the review process and can provide you with contact information. Alternatively, your peers can suggest your name as a potential reviewer to the editor. Finally, you could offer to co-review a paper with a peer who is already an ad hoc reviewer. This could give you a chance to “prove yourself” to the editor. If you want to co-review a manuscript, then you should contact the editor about this in advance, as submitted manuscripts are confidential outside the review process and a co-reviewer would usually receive an acknowledgement in one of the issues of the journal.

Publish . If you produce research, then eventually someone will “cold call” you and ask for your opinion on a manuscript. However, this latter method may take some time and is not very efficient.

If you were interested in reviewing, have contacted an editor, and received your first manuscript for review, how should you approach the assignment? As with writing a paper or grant, evaluating a patient, or teaching a course, there is no right or wrong way to review a paper, but there are some basic guidelines one might follow.

Before thinking of specific review suggestions, we suggest first taking a mental step backward to reflect on the bigger picture of the task at hand. It may help to consider that a colleague has taken a great deal of time and energy to conceptualize a problem, identified data that can address the problem, gathered and analyzed the data, and finally written down all the relevant information related to the specific project at hand. Whether all of these steps were effective and led to a publishable paper or not, each of the individuals who is willing to undertake this research activity does so with the belief that new and important information is being gathered for sharing with peers. It is therefore the general goal of a peer reviewer to be helpful to the author(s), even if the review identifies numerous issues that may ultimately prevent publication. Ideally, there is a spirit or tone set in the reviewing process, which is devoid of bias, competition or envy, arbitrariness, and harshness. Stated more positively, as it was first established by the Royal Society of England in the seventeenth century, effective peer review as a component of scientific journal publishing has been conceptualized as a professional consultation that is delivered in a respectful and timely manner on an intellectual matter, with the process being confidential (cf. Moore, 2005 ). The peer reviewer is serving as a consultant to the author(s) and with the editor, and is essential to the integrity of the scientific publishing process in that an editor cannot be a content expert in all fields ( Moore, 2005 ). Therefore, the duties of a peer reviewer are not to be viewed lightly.

Some reviewers prefer to start with a brief introductory paragraph that summarizes the article and its major findings. This section might also highlight any broad strengths and weaknesses of the manuscript. Although not required, this approach has the advantage of assuring the editor and author that you read the article and understand its main points. Alternatively, other reviewers prefer to skip this step and get right to the critical comments. The key throughout is to evaluate the manuscript, and not simply annotate it. The authors, in particular, already know what they did and what their study was about, and long detailed summaries are not helpful. The reviewer's comments should be evaluative.

Reviews can be structured based on the sections of a typical manuscript (e.g., Introduction, Methods, Results, Discussion), and relevant comments are only made using the outline of those sections. Alternatively, reviews can lay out comments in order of importance (e.g., most important to least important), perhaps with headings that identify “major” and “minor” points for the authors to consider. Some form of structure, if only to present comments in the order of the manuscript pages is better, than presenting comments in a haphazard manner. More structure usually allows the editor and author to follow your reasoning and act accordingly (e.g., make a reasoned decision on the manuscript).

Below are some suggestions to consider when reviewing specific sections of a manuscript.

Does the introduction begin generally, and then become more focused? Bem (1987) noted that an empirical article often has an hourglass shape, starting broadly, but narrowing its scope as the Introduction moves to the Methods section. The Results section is also narrow, but gradually widens its scope throughout the Discussion section.

Is the relevant and most recent literature cited and reviewed? Although classic studies in neuropsychology might set the stage, more contemporary studies usually provide more relevant information. For example, a 1975 article using the WAIS might have been relevant in a prior era, but a 2005 article using the WAIS-III is more relevant to the present reader.

Are all critical topics and/or questions adequately covered in this section? Do gaps exist in the authors' line of reasoning? Does this section “flow”?

Is the length appropriate? Is it too long and does it cover unnecessary background information? Many inexperienced authors write lengthy, dissertation style introductions, which take up valuable journal space and thereby may unwittingly put the article's acceptance into some jeopardy. A reviewer can help the author by recommending the introduction be tightened and condensed. Conversely, some Introductions can be too brief, leaving an uninformed reader without any context.

Is a specific purpose of the study stated?

Are specific hypotheses clearly stated? And are the hypotheses properly motivated by the background provided in the Introduction?

Methods sections are generally quite specific and detailed, and it is difficult to comment on all the issues and possible problems that a reviewer might encounter. This is where identifying key issues is quite important. Some guidelines for reviewing this section of paper:

Is the sample appropriately/adequately described? To utilize research findings, readers need to know who the sample was, what were the recruitment procedures, how were participants assigned to groups. Some information about demographic characteristics should be reported (e.g., means, standard deviations, ranges). Age, education, and sex might be most relevant in some studies, whereas Glasgow Coma Scale and length of loss of consciousness might be most relevant in others. Given the increasing diversity of the population, information regarding race/ethnicity and primary language are oftentimes necessary.

Was approval received from the local Institutional Review Board? Was informed consent obtained from each participant?

What were the methods of data collection? Are methods adequately described so that the study could be replicated? For example, it is more informative to indicate that “age-corrected standard scores from the test manual of the California Verbal Learning Test – II were used” than “the California Verbal Learning Test – II was used.”

If an intervention was used, is it also adequately described so that readers can understand what was done? Sometimes a citation to another published article is sufficient; sometimes it is not.

What are the statistical analyses utilized? Are the dependent variables clearly stated? Are the statistics appropriate for the questions? Are relevant covariates considered? Would non-parametric tests be more appropriate? Are there sample size/power concerns (e.g., too many analyses and/or no alpha correction)? Would other analytic techniques better answer the same question? Should the author(s) drop/add any analyses?

Are the necessary results presented? Are the relevant statistical values and degrees of freedom reported, along with p -values? If appropriate, are effect sizes reported?

Is the presentation of the results understandable to someone who does not do research in this area? Within the body of the text, subheadings might make results easier to read and understand. Within tables, clear column and row headings can increase the value of the table.

Are figures and tables appropriately titled? Do the authors use notes that are clear and informative? Are all abbreviations used within figures and tables defined in notes, such that readers will not have to search back through text to grasp their meaning?

Do the authors include analyses that were not discussed in the Introduction and Method sections? Conversely, are analyses missing that were mentioned earlier in the paper?

Do the authors commit some of the “deadly sins” outlined by Millis (2003 )? In this paper, Millis highlights some common statistical errors made in neuropsychology manuscripts (e.g., multiple comparisons, low power, ignoring missing data), as well as provides guidance for correcting these errors.

Does the manuscript need an expert statistical review? Some statistical analyses can be quite challenging for the average reviewer, and it is appropriate to let the editor know that you do not have the expertise to fully comment on results. Consider that if these results are too complex for you as the reviewer, then they may also be too complex for the typical journal reader, in which case the authors must do a better job of explaining their analyses.

In the Discussion section, the results are often summarized, integrated, and put into context with the existing literature.

Although the primary sections of the manuscript associated with the preceding points are most important to evaluate when reviewing a manuscript, reviewers are also expected to comment on other components of the manuscript, if they are relevant to the question of publishability. The following are common points on which the journal editor will appreciate guidance from a reviewer.

Are the conclusions supported by the findings? Some authors tend to over-interpret their findings. In a manner of speaking, data are specific to that single study, and they may have limited relevance outside of that single study. As such, data should not be stretched to topics or levels of meaning for which they are not relevant. Reviewers should offer constructive suggestions to the authors for correcting over-interpretation of data.

Are the findings generalized to the appropriate populations and/or settings? For example, a finding that symptom validity testing incorrectly classified children with Learning Disorders should not necessarily be extended to adults with Learning Disorders or even children with other development disorders.

Are the results put into context with the existing body of literature? If they stand apart from other studies, do the authors discuss why this might be?

Do the authors simply restate the Results section? Although authors often highlight certain findings in this section, they should move beyond a restating of the findings and actually discuss and integrate their findings.

Are new results and/or data presented that have not been discussed elsewhere? Typically, results should be presented in the Results section, not the Discussion section.

Do the authors provide a conceptual framework and/or presentation for how and when to use the findings? Finding a statistically significant result does not necessarily answer the “so what?” question. What is gained by the completion and reporting of this study?

Are the appropriate caveats and limitations to the article mentioned and discussed?

Are future research directions provided?

Has the manuscript been carefully prepared in the appropriate style of the journal (e.g., APA writing style)?

Does the theme/content of the manuscript fit with the aims of the journal?

Does the manuscript address something new and add something to the existing literature?

How adequate is the general writing style (e.g., grammar, punctuation, and readability)? When recommendations are made for correcting or improving the writing, it is best to be very specific. That is, rather than suggest that the authors improve their writing and remove errors, it is much better to identify the page and line in which the problems can be found, and to suggest the solution.

Is there an Acknowledgment section? Does it mention the funding agency? What was the role of the funding agency in the project? Were possible conflicts of interest of the authors noted to the reader?

Is the Reference section correctly formatted for the specific journal? Are all references mentioned in text provided in the reference section and vice versa? Are there errors in specific references? If references are recommended by a reviewer, it is best to provide the author the complete citation, to avoid confusion. If at all possible, avoid suggesting that the authors cite your own work; this is viewed as self-promotion and to be avoided, unless the citation in question is truly seminal and the omission would substantially weaken the instructive nature of the article for the reader.

Now that you have completed your initial review, you might wonder how to hone your new craft. Below are some suggestions for becoming a more refined ad hoc reviewer.

Peer reviewers, the bedrock of medical journal objectivity, require more training and experience. One simple solution might be for editors to provide them with a short review of best practices along with the checklist of core elements to consider . ( Ray, 2002 , p.772)

Provide critiques in a timely manner. Try to adhere to deadlines. If you are late on a review, it slows down the process and prevents the editors from achieving a timely decision on the manuscript, and the authors from receiving timely feedback. If you will be late with the review, contact the editor to give an estimate of when it will be completed.

Reviewing takes practice. Similar to writing papers, interviewing patients, and preparing lectures, do not expect that your initial review will be your best work. However, it is likely that reviewing more papers will make you better at it.

Readily ask for advice/guidance. You do not need to be an expert on every topic to be a good reviewer. Know your areas of expertise within neuropsychology (e.g., specific tests, specific disorders) and research (e.g., study design, statistics). When the manuscript exceeds those areas, do not be afraid to consult with others. You could ask colleagues for their thoughts (without divulging the entire manuscript). You could refer to the literature to see how others have addressed similar problems. You can let the editor know that certain aspects of the manuscript fall outside your knowledge base, and, in some instances, you should decline the solicitation to be a reviewer if the topic is not sufficiently within your knowledge and expertise. The boundary on when to decline a topic based on lack of relevant knowledge is perhaps best illuminated by asking yourself the question, “If I was the author, would I want a reviewer who has only my degree of knowledge to pass judgment on my hard work?”

Learn from the other reviewers. Once you submit your review, most journals will allow you to see the other completed reviews. Use this as opportunity to see if your specific comments and ultimate recommendation for the disposition of the paper align with the reviews of potentially more experienced reviewers, and learn from the issues brought up by other reviewers that you may have missed.

Cover the entire manuscript. Try to find strengths and weaknesses in all components of the submission. Resist the urge to stop reviewing the manuscript because you found a “fatal flaw” in the Introduction or Methods sections. Your feedback to the authors may lead the editor to reject the manuscript, but the feedback itself can still be very valuable to the authors in their future research endeavors. It is easy to be critical; it is better but more challenging to help the authors improve their paper by adopting a constructive tone to your critique.

Provide helpful comments to the author(s). Reviewers can help shape manuscripts (and ultimately the field). By providing constructive and concrete comments, a reviewer can provide direction that assists the author in building a better manuscript. Unclear suggestions (e.g., “statistical analyses are wrong”) provide no real instruction to the authors when revising their work. In a survey of corresponding authors of a psychology journal, Nickerson (2005) found authors want specific information on problems in their manuscripts and concrete suggestions to improve those problems.

At all costs, avoid tirades and unnecessary negativity in reviews. There is no place in a review for personal attacks or vendettas of any sort. If you disagree with the author, then state so in a constructive manner. Try to help the author.

Avoid statements about your recommendations for acceptance or rejection of the manuscript. Most editors will prefer that the reviewers' comments to the authors focus on the strengths and weaknesses of the submission, rather than indicating whether the reviewer thinks the paper should be published. If one reviewer is recommending publication and others are not, this can be confusing and frustrating to the authors. You can indicate your enthusiasm/degree of concern about the work, but reserve comments such as “I think this paper should definitely be published” for the section of confidential communication to the editor.

Unpublished manuscripts are confidential. Until a manuscript is accepted and “in press,” you cannot reference it. Refrain from contacting authors or letting them know in any manner that you reviewed their work. Do not share the findings with your peers. A corollary of this point is that you, as the reviewer, also have anonymity. Editorial staff will not divulge who reviewed specific manuscripts.

Signing reviews remains a controversial practice in the review process. In favor of this practice, it makes the process more transparent (e.g., authors know how specific individuals feel about their work). This may carry more weight in the revision process, especially if the reviewers are more senior members of the field. Conversely, confidentiality is removed, which may affect critiques. Reviewers should check with journal editors about their preferences for signing reviews.

The authors of this article hold editorial positions at several neuropsychology journals, including Archives of Clinical Neuropsychology (KD, HW, RJM), Applied Neuropsychology (CRR), Journal of Clinical and Experimental Neuropsychology (WGvG, DT), and The Clinical Neuropsychologist (JJS).

We would like to gratefully acknowledge the contributions of the Guest Action Editor, Arthur MacNeill Horton, Jr., Ed. ABPP, ABPN, and the anonymous reviewers of this manuscript.

Google Scholar

Google Preview

  • neuropsychology
  • peer review
Month: Total Views:
December 2016 2
January 2017 2
February 2017 3
March 2017 4
May 2017 1
June 2017 1
July 2017 2
September 2017 1
October 2017 1
November 2017 6
December 2017 13
January 2018 3
February 2018 3
March 2018 7
April 2018 8
May 2018 2
June 2018 9
July 2018 11
August 2018 6
September 2018 2
October 2018 3
November 2018 13
December 2018 4
January 2019 5
February 2019 8
March 2019 11
April 2019 11
May 2019 14
June 2019 10
July 2019 8
August 2019 4
September 2019 7
October 2019 20
November 2019 14
December 2019 7
January 2020 6
February 2020 8
March 2020 8
April 2020 26
May 2020 11
June 2020 32
July 2020 10
August 2020 10
September 2020 4
October 2020 11
November 2020 25
December 2020 11
January 2021 9
February 2021 8
March 2021 14
April 2021 14
May 2021 4
June 2021 8
July 2021 6
August 2021 8
September 2021 17
October 2021 14
November 2021 21
December 2021 23
January 2022 10
February 2022 9
March 2022 12
April 2022 13
May 2022 5
June 2022 7
July 2022 18
August 2022 15
September 2022 12
October 2022 8
November 2022 9
December 2022 17
January 2023 15
February 2023 5
March 2023 19
April 2023 13
May 2023 28
June 2023 6
July 2023 10
August 2023 24
September 2023 12
October 2023 21
November 2023 7
December 2023 9
January 2024 21
February 2024 8
March 2024 6
April 2024 12
May 2024 9
June 2024 9
July 2024 15
August 2024 12

Email alerts

Citing articles via.

  • Recommend to your Library

Affiliations

  • Online ISSN 1873-5843
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

ASHA_org_pad

What to Expect in Peer Review

The asha journals peer review model, peer review steps and timeline, before and after a decision on your manuscript.

Manuscripts submitted to the ASHA journals go through an editorial board peer review model. In this model, an editor-in-chief (EIC) is responsible for assigning each manuscript to an editor who has the appropriate content expertise. The editor assigns typically two to three reviewers who are editorial board members (EBMs) or one EBM and one ad hoc reviewer, or any combination thereof. Reviewers submit lists of strengths and weaknesses in a number of categories appropriate for the type of manuscript as well as any brief additional comments. Upon receipt of reviews, the editor is not expected to provide additional detailed comments. The editor, in a decision letter, instead helps the author identify the most important changes, particularly when EBMs or ad hoc reviewers disagree. An editor would be free to recruit additional reviews, such as for specialized statistics review, as needed.

This is a change from the previous peer review model in which an editor rendered a decision after two to three reviews were submitted to an associate editor, who made a decision recommendation. Also, review comments were not structured.

Review Policies

ASHA journals perform single-anonymized reviews, which means that the reviewer knows the author’s name, but the authors do not know the reviewers’ identities unless reviewers choose to include their names in the review. On rare occasions, authors do request a double-anonymized review (please see the Anonymized Review Policies page for additional information). Our standard review process is outlined below.

Original Submission Review

Using the ASHA Journals Editorial Manager system, you will upload a properly formatted manuscript and answer a series of disclosure questions (see our guide on Manuscript Submission for more information). The manuscript will then be assigned by the editor-in-chief to an editor with the right subject matter expertise. The editor will typically then assign the manuscript to at least two editorial board members (EBMs) or ad hoc reviewers, or some combination thereof, for reviews. The EBMs or ad hoc reviewers submit comments using a structured peer review template, along with a decision recommendation, to the editor. The editor then reads the reviews in depth, considers the recommendations, and renders a decision.

Author Revision and Submission

If your manuscript requires a revision, as is most typically the case, then you will be given up to 6 weeks to revise and resubmit the manuscript.

Revised Submission Review

After receiving your revised manuscript, the journal editor will typically then assign at least two EBMs or ad hoc reviewers, or some combination thereof, to review the revised version of the manuscript. The reviewers will submit comments and recommendations, and then the editor will render a revision decision.

Second Author Revision and Submission

If your manuscript requires a second revision for acceptance, you will be given up to 3 weeks to submit a revised manuscript.

Overall Estimated Time From Submission to Decision

Assuming two rounds of review (one round for the original submission and one round for the revised manuscript), time from submission to final decision in the editorial board peer review model can take as little as approximately 4 months. But again, the overall time from submission to final decision of a manuscript depends largely on the number of rounds of review and how long authors take to complete revisions. Authors following submission instructions and submitting revisions that thoroughly address review comments help peer review maintain a swift pace.

If Accepted

What's next.

If your article is accepted, it will begin the journal production process . During the production process, you will be asked to provide some answers to author queries and make some basic revisions, but most of the process will be handled by the ASHA Journals production staff at this point.

If Rejected

There are a number of reasons a manuscript may be rejected for publication in the ASHA Journals. They can range from the manuscript not being a good fit for the scope and mission of the journal to which it was submitted, to concerns over the overall quality.

Authors may disagree with the decision of the editors of ASHA journals and may wish to challenge and appeal those decisions.

All appeals concerning decisions of an editor are first directed to the editor. In many cases, author-editor disagreements can be resolved directly through discussions between these parties. If no resolution is achieved, the author may file an appeal with the chair of the Journals Board.

The Journals Board chair discusses the disagreement with both parties to determine whether the dispute involves matters of scientific or technical opinion. If the dispute solely concerns such differing opinions, the appeal is not considered further and the original editorial decision is upheld. The chair then notifies the author and editor of the decision.

If the chair concludes that the issue could be the result of personal bias and/or capriciousness in an editorial decision, the chair then convenes an ad hoc Journals Board Appeals Committee. This committee is made up of two voting members of the Journals Board and the Journals Board chair. This committee is charged with the task of determining whether the author’s appeal has merit. This decision will be determined by majority vote.

If the decision is that there is no merit to the appeal, the chair of the Journals Board notifies the editor and the author of the decision.

If the committee determines that the appeal has merit, the editor is given an opportunity to reconsider the final decision.

If the editor maintains the original decision, the chair of the Journals Board may assign a new guest editor for the manuscript. New editorial board member reviewers would then be solicited and the review process re-initiated.

Author Resource Center

Related content, aja special issue: internet and audiology, select papers from the 45th clinical aphasiology conference, improved review process with new editorial board structure, now in effect, quick resources.

  • Perspectives  
  • Journal Production Steps

Quick Facts

Number of journals: 5 Editors-in-Chief: 10 Editors: 59 Editorial Board Members (EBMs): 352 Time from submission to decision: Approximately 4 months

Average acceptance rate: 52% Growth in amount published since 2010: 58%

ASHAWire

About the ASHA Journals

ASHA publishes four peer-reviewed scholarly journals and one peer-reviewed scholarly review journal pertaining to the general field of communication sciences and disorders (CSD) and to the professions of audiology and speech-language pathology. These journals are the  American Journal of Audiology ;  American Journal of Speech-Language Pathology ;  Journal of Speech, Language, and Hearing Research ;  Language, Speech, and Hearing Services in Schools ; and Perspectives of the ASHA Special Interest Groups . These journals have the collective mission of disseminating research findings, theoretical advances, and clinical knowledge in CSD.

Connect with the ASHA Journals

Subscribe to the asha journals.

General Questions: [email protected]

For Perspectives : [email protected]

Additional Author Services

ASHA Author Services Portal  

ad hoc journal article review

© 1997-2024 American Speech-Language-Hearing Association Privacy Notice Terms of Use

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

electronics-logo

Journal Menu

  • Electronics Home
  • Aims & Scope
  • Editorial Board
  • Reviewer Board
  • Topical Advisory Panel
  • Instructions for Authors
  • Special Issues
  • Sections & Collections
  • Article Processing Charge
  • Indexing & Archiving
  • Editor’s Choice Articles
  • Most Cited & Viewed
  • Journal Statistics
  • Journal History
  • Journal Awards
  • Society Collaborations
  • Conferences
  • Editorial Office

Journal Browser

  • arrow_forward_ios Forthcoming issue arrow_forward_ios Current issue
  • Vol. 13 (2024)
  • Vol. 12 (2023)
  • Vol. 11 (2022)
  • Vol. 10 (2021)
  • Vol. 9 (2020)
  • Vol. 8 (2019)
  • Vol. 7 (2018)
  • Vol. 6 (2017)
  • Vol. 5 (2016)
  • Vol. 4 (2015)
  • Vol. 3 (2014)
  • Vol. 2 (2013)
  • Vol. 1 (2012)

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

Mobile Ad Hoc Networks: Recent Advances and Future Trends

  • Print Special Issue Flyer
  • Special Issue Editors

Special Issue Information

Benefits of publishing in a special issue.

  • Published Papers

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section " Networks ".

Deadline for manuscript submissions: closed (31 October 2021) | Viewed by 6958

Share This Special Issue

Special issue editor.

ad hoc journal article review

Dear Colleagues,

Mobile ad hoc networks facilitate the key technologies of the next-generation Internet of things and cyber-physical systems. They show potential for disaster management services, commercial environments, public environments, coverage extension due to crowdy areas, game theory, military battlefields, remote hilly areas, and deep-sea regions. These days ad hoc network concepts have expanded to FANET, VANET, SANET, WSN, UAC, UAVs, etc. With major design challenges, such as power management and the lack of fixed infrastructure, new research trends are focusing on multi-hop nature, device heterogeneity, network scalability, security, topology management, location management, device discovery, and interoperability with new emerging ad hoc networks.

For this Special Issue, we invite submissions from all areas relating to the applications and challenges of mobile ad hoc networking, recent advances, and future trends. Contributions must relate to at least one of the following topics of interest:

  • System design for mobile ad hoc networks.
  • Internet of drones;
  • Internet of mobile things;
  • Mobile social networking;
  • Intelligent disaster management;
  • Smart city and smart mobility;
  • Autonomous intelligent systems.
  • Interoperability with underwater communications;
  • Interoperability with unmanned aerial vehicles (UAVs);
  • MANET with multi-hop wireless;
  • Integration among MANET, FANET, VANET, SANET, WSN, etc.;
  • Coexistence with cellular-V2X (C-V2X), 5G.

Prof. Dr. KyungHi Chang Guest Editor

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website . Once you are registered, click here to go to the submission form . Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

  • safety and disaster management
  • machine learning
  • Internet of drones, smart mobility
  • multi-hop wireless
  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here .

Published Papers (2 papers)

ad hoc journal article review

Further Information

Mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

  • Computer Networks
  • Computer Science and Engineering
  • Computer Communications (Networks)
  • Ad Hoc Networks

Classification and comparison of ad hoc networks: A review

  • December 2022
  • Egyptian Informatics Journal 24(1)

Reeya Agrawal at GLA University

  • GLA University

Neetu Faujdar at GLA University

  • Universidad Santiago de Cali

Oshin Sharma at SRM Institute of Science and Technology

  • SRM Institute of Science and Technology

Abstract and Figures

Characteristics of Wireless Ad Hoc Network.

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • Loubna Chaibi

Marouane Sebgui

  • Abdelkader Betari
  • Priya Poonia
  • Laxmi Narayan Balai
  • J NETW SYST MANAG

Zbigniew Adam Kotulski

  • Mariusz Sepczuk

jean-philippe Wary

  • Vyacheslav V. Borodin
  • Valentin E. Kolesnichenko
  • Natalia Kovalkina
  • V. Anjana Devi
  • Dr Vithya Ganesan
  • V. Sri Anima Padmini
  • Shriman k.arun
  • A. Immanuvel
  • N. N. Sureshkumar

Emmanuel Asituha

  • Periyasamy Pitchaipillai

Nixson Jeheskial Meok

  • Frans F.G. Ray
  • Neeraj Verma
  • Sarita Soni

Kahkashan Tabassum

  • Hafiza A. Elbadie
  • Afifah Dwi Ramadhani
  • Alon Jala Tirta Segara
  • Aditya Wijayanto
  • AD HOC NETW

Kai-Yun Tsao

  • Matin Macktoobian
  • Smita Rani Sahu
  • Biswajit Tripathy
  • Sandhya Sarikonda

Shyamala ravi Kattula

  • Pranay Nigam
  • Jiechao Gao
  • Gunasekaran Manogaran
  • Tu N. Nguyen

Priyan M K

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Ad hoc digital communication and assessment during clinical placements in nursing education; a qualitative research study of students’, clinical instructors’, and teachers’ experiences

Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliations Department of Nursing and Health Promotion, Oslo Metropolitan University, Olso, Norway, Department of Research, Sunnaas Rehabilitation Hospital, Nesoddtangen, Norway

ORCID logo

Roles Formal analysis, Investigation, Resources, Software, Validation, Writing – review & editing

Affiliations Institute of Health and Society, Faculty of Medicine, University of Oslo, Oslo, Norway, Oslo University Hospital, Oslo, Norway

Roles Data curation, Formal analysis, Investigation, Resources, Software, Validation, Writing – review & editing

Affiliation Department of Health and Social Studies, Østfold University College, Halden, Norway

Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Supervision, Validation, Writing – review & editing

  • Edel Jannecke Svendsen, 
  • Randi Opheim, 
  • Bjørg Elisabeth Hermansen, 
  • Camilla Hardeland

PLOS

  • Published: July 21, 2023
  • https://doi.org/10.1371/journal.pone.0287438
  • Reader Comments

Table 1

There was a concern about the shortage of nurses that resulted from the Covid-19 pandemic. Therefore, universities and university colleges were instructed to continue educating nursing professionals but were challenged by the social distancing and the limitations of clinical placements and clinical-field instructors. Clinical placement is essential in the students’ development of practical skills and knowledge. Thus, transitioning to a digital follow-up platform of communication with the students between the universities/college and the clinical practice sites became necessary.

To obtain knowledge about the experiences from the university/college teachers, students, and clinical-field instructors regarding the transition to a digital learning environment that resulted from the COVID-19 pandemic.

Qualitative individual digital interviews were conducted for data collection at three different nursing education programs from three Norwegian university/university college sites. Five students, four clinical-field instructors, and nine university/college teachers participated (n = 18).

The inductive analyses identified two main themes: (1) Efficiency compromising pedagogical quality, and (2) Digital alienation.

Conclusions

Students and university/college teachers were worried about fluctuating quality with digital pedagogical. There were concerns that the students educated during this period will have reduced clinical competencies.

Citation: Svendsen EJ, Opheim R, Hermansen BE, Hardeland C (2023) Ad hoc digital communication and assessment during clinical placements in nursing education; a qualitative research study of students’, clinical instructors’, and teachers’ experiences. PLoS ONE 18(7): e0287438. https://doi.org/10.1371/journal.pone.0287438

Editor: Yaser Mohammed Al-Worafi, University of Science and Technology of Fujairah, YEMEN

Received: March 17, 2022; Accepted: June 6, 2023; Published: July 21, 2023

Copyright: © 2023 Svendsen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: There are ethical restrictions on sharing the de-identified data set since permission was not obtained for this purpose and since they contain potentially identifying or sensitive information, the restriction is imposed by Research Ethics Committee. For request about data please contact Roger Markgraf-Bye at the institution of University of Oslo by email: [email protected] or by telephone: +47 90 82 28 26

Funding: The authors received no specific funding for this work

Competing interests: The authors have declared that no competing interests exist

Introduction

Over the last three decades, many universities and university colleges have used a variety of digital platforms as a possibility for teaching modernization and efficacy. The digitalization of nursing education has typically involved E-learning initiatives and virtual simulations [ 1 ], but digitally hosted lectures and evaluation meetings with students in clinical practice were less common pre- pandemic. A review concluded that for an E-learning-based program to be integrated into a curriculum, staff and students require appropriate training [ 2 ]. When the pandemic started there was little time for training on how to use digital tools. However, Langegård, Kiani [ 3 ] identified how using digital tools in higher education had increased over the last decade. Digitalization has thus introduced a new dimension in teachers’ pedagogical skills and competences, which can be referred to as pedagogical digital competence. This digital competence comprises the ability to plan, to conduct, to evaluate and revise information and communication technology (ICT)-supported teaching, as well as use current research in supporting students’ learning in the best possible way [ 4 ], especially due to the physical restrictions with the COVID-19 pandemic. When new communication technology was introduced into nursing education, the university/college teachers reported that it was challenging to teach and to support the students, particularly when they were unfamiliar with aspects of the platforms and software [ 5 ].

Through clinical placements, nursing students acquire practical skills and theoretical knowledge while developing their professionalism in an actual clinical environments [ 6 ]. Integrating theory and practice is easier for students to comprehend when strong cooperation between educational and clinical institutions exists [ 7 , 8 ]. During clinical placements, the students are usually assigned one clinical field-instructor. This instructor is a nurse who provides support, supervision, and evaluation of the student’s learning process, but is less likely to have formal pedagogical education. During the clinical placement, the teachers monitor the evaluation meetings the student and instructor. An Australian survey revealed that resource provision and universities’ communication with clinical-field instructors are challenging [ 9 ]. Some clinical-field instructors experience a lack of interest and cooperation from the university/college teachers and are frustrated by the lack of interpersonal contact with the university/college [ 10 ]. In addition, some clinical field instructors do not hold the qualifications they are supervising their students to obtain and the university involvement in preparing clinical-field instructors is scant [ 9 ]. These obstacles can result in difficulties in communicating about the student’s skill and knowledge development.

Already before the pandemic took hold, tensions existed about expectations to clinical placement capacity from newly developed nursing programs, and the high numbers of students from diverse educational and socioeconomic backgrounds entering these programs [ 11 ]. The COVID-19 pandemic put the health and care workforce under unprecedented pressure. The juxtaposition between healthcare having the capacity in providing services to meet the demands of patients is a struggle that can be observed in all countries. Due to the anticipated reduction in nursing workforce following the COVID-19 pandemic, universities and university colleges were instructed by Norway’s authorities to continue delivering nursing education despite knowing there would be problems with providing clinical placements sites and available clinical-field instructors. The Norwegian government passed a temporal instruction on the enactment of education during COVID-19 (Educational instruction covid-19, 2020, §1–3). This instruction provided universities and university colleges flexibility.

Consequently, digital resources for video conferences and lectures, such as Zoom©, Teams©, and Skype©, were introduced to monitor students in clinical placements. This digital transition from face-to face meetings to use of digital resources, was implemented over a short time span. Apart from the clinical placement, which required physical attendance, the nursing students’ learning environment shifted to mainly digital. Hence, the preparatory lectures and student-teacher meetings leading up to the clinical placement were digitally provided. Knowledge about how this process was experienced, can contribute to the development of pedagogical quality in digitalized nursing education.

Aim and research questions

This study aimed to attain knowledge about university/college teachers’, students’, and clinical-field instructors’ experiences regarding the digital transition in nursing education. The following research questions were developed:

  • How did university/college teachers, students and clinical-field instructors experience the digital transition from face-to- face to digital meetings in the follow-up of students in clinical practice?
  • What were the challenges and benefits of digital evaluation meetings during clinical placements?
  • How was the digital learning environment experienced by university/college teachers, clinical field instructors and students during the ad hoc digitalization?

This study used a qualitative exploratory design and applied individual interviews for data collection. Individual interviews are useful when investigating personal experiences and understandings [ 12 ]. The interviews were conducted using the digital platforms Zoom© . Although face-to-face interviews had been the norm in qualitative interviewing, video-based interviews are proving to also render high-quality data [ 13 ]. The interview guide was developed through collaborative discussions by the research team based on earlier practice experiences and literature review. The interview questions addressed expectations and experiences with use of the digital platform, meetings during the clinical placement, benefits and concerns connected to clinical placements and digital meetings and changes in the learning environment. Hence, the interview method was semi-structured interviews. All this study’s authors performed interviews. The interviewers (all women) were nurses and teachers at the selected university/university colleges.

Participants

We aimed to include participants with a broad range of experiences, consequently recruitments were made from three different educational institutions in Norway, (two Universities, and one University College) representing three nursing education programs and three different graduate levels. The respective educational institutions were in both urban and rural environment and clinical placements were from both primary and tertiary care. Participants comprised three groups: students, clinical-field instructors, and teachers. All participants had either finished their clinical placement (students), supervised students during clinical placement periods in primary care and hospitals (clinical-field instructors) or assessed them (teachers) in the spring of 2020 during the first period of the society lockdown following the onset of the COVID-19 pandemic.

Recruitment

After the leaders approved of the study, all groups of participants were invited by email by the authors with information about the study and the consent-form. Those who replied positively to the email were invited to a digital interview. In total, 60 invitations were sent to potential participants by mail.

The educational institutions comply with the European Credit Transfer and Accumulation System where 60 European credits are the equivalent of a full year of study or work [ 14 ]. The three university/university college sites were as follows:

  • A large, urban university. Participants were recruited from a master’s program in advanced geriatric nursing, a program comprising 120 European credits.
  • A university somewhat smaller than Site 1 but in an urban location. These study participants were connected to or attended a mixed master’s/continuing education program 90 or 120 European credits (depending on whether they employed a master’s program or continuing education).
  • A university college significantly smaller than the other two Sites and with a more distant/rural location. These study participants attended a bachelor’s program in nursing comprising 180 European credits.

All three sites held multiple individual evaluation meetings during the clinical placements. An evaluation meeting is a formal session where the student the university/college teacher and the clinical field instructor participate. These meetings included formative evaluations, which are evaluations for learning; they reveal what (and how) students are learning. The evaluation forms used in the evaluations were developed by the individual educational institution for each clinical placement based on the learning outcomes for that particular period of clinical placement and adjusted to the specific graduate level. These evaluations aimed to provide the students and clinical-field instructors with an understanding of the students’ performance levels and enable the instructor to adjust accordingly to meet each nursing student’s additional learning needs. There is also an element of assessment or formative evaluation since the students need to pass the clinical placement.

The interviews lasted between 13 and 65 minutes and were transcribed verbatim, resulting in 103 written pages. The analysis was based on Braun and Clarke [ 15 ] approach to thematic analysis. Thematic analysis is a method used to identify and analyze themes within a dataset. It consists of six phases: (1) transcribing, reading, and re-reading the data to familiarize oneself with it, while generating codes for the dataset and organizing data relevant to each potential theme; (2) generating codes and (3) searching for themes; (4) reviewing the identifying themes derived from the data; (5) defining and naming themes; and (6) producing the report. All co-authors individually participated in the coding, reviewing, and naming the preliminary themes. To ensure rigor, consensus meetings were held to establish inter-coder agreement in number and naming of themes and sub-themes. The inductive analysis was driven by the research questions and analysis resulted in three main themes and corresponding subthemes addressing all three research questions respectively.

The final analysis is presented in Table 1 . The quotes are presented with the corresponding number of participants (ID: 1–19, with prefix ‘S’ for student, ‘I’ for instructor, and ‘T’ for teacher (i.e., “S-2” equals participant number 2, who is a student). The terms university/college teachers, students, and clinical-field instructors refer to bachelor’s, master’s, and continued education levels. The interviewees’ quotes in Norwegian were translated into English by the first author and cross-checked by the coauthors and a language consultant.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0287438.t001

Approval to perform this study was granted by the educational leaders of each institution and the Norwegian Center for Research Data (Reference number 557602). All participants received information about the study and gave their written informed consent. There was no potential bias or conflict between researchers and participants, as none of the co-authors had regular close collaboration or supervision of the participants during the project/data collection period.

Undergraduate- and graduate-level nursing students (n = 5), clinical-field instructors (n = 4), and teachers (n = 9) participated. They were connected to three universities/university colleges representing three nursing education programs in Norway (n = 18). The participants were all women and between 25 and 58 years. The students were the youngest and teachers the oldest.

All participant groups had experienced digital evaluation meetings during the clinical placements. In addition, the teachers and the students also had held or participated in digitally hosted lectures or group reflections in preparation for their clinical placements. They shared stories about their experiences of using both digital contexts. Digital meetings about students’ accomplishments and evaluations of their clinical practice performance took place with the student and clinical-field instructor sitting together, sharing one digital device at the clinical placement site, while the teachers participated digitally from their home/office. The digitally hosted reflection groups and lectures were arranged with the students and the teachers sitting alone in their homes/offices. The most used digital platforms were Zoom , Teams , and occasionally Skype , but telephone and facetime were also used. Web-based learning management and evaluation systems were also used during the students’ clinical placements to provide general information, curriculums, assignments, and evaluation forms.

Two main and seven sub themes regarding the transition to digitized learning environment were identified across all participant groups and presented in Table 1 . Main themes were: (1) saving time and increasing efficiency but losing reflection and pedagogical quality; (2) digital alienation and lack of real-life socialization.

Efficiency compromising pedagogical quality

Saving time and being on time..

University/college teachers perceived the digital meetings as timesaving and efficient because they did not have to meet at the hospitals/homecare facilities to host physical meetings with the students and clinical-field instructors.

“All the interrupters were gone , and the unnecessary talk disappeared . The meetings took shorter time . Some of this is positive “(T9) .

Additionally, the teachers encountered that the clinical-field instructors felt more obligated to turn up at the scheduled meeting time, and thus saved time for all as no one had to wait for the clinical-field instructors to finish a task.

“The positive experience I had with this , is that everybody has actually turned up at the right time”(T17) .

The students did not mention the meetings’ timesaving element, but some clinical-field instructors found the digital meetings to be more stressful to combine with their clinical nursing work. They thought it was difficult to be on time for the scheduled meetings and that responding to patients’ needs took precedence over the meetings. However, it was more difficult to let the university/college teachers know when they were behind. They found also that the meetings were more efficient, since the time for small talk was reduced and the meetings started right on time.

Students, like the teachers, found the digital reflection groups and digital preparation lecturers both timesaving and flexible.

“You save a lot of time when you don’t have to travel back and forth . And also , you can be available at any time”(S2) .

A few participants did question if the digital platform were actually timesaving since they experienced technical problems with it.

“You spend a lot of time and effort getting the video call started , and you are interrupted by technical challenges and the constant need to check if they can still hear me” (I6) .

A challenging learning environment with fluctuating pedagogical quality.

All participants mentioned frustration and wasted time on technical problems while logging in, switching the camera off and on, and a problem with microphones.

“I know that many students muted or exited the Zoom-room due to technical difficulties”(S10) .

Students reported missing lectures because of a bad internet connection or low digital competence. In addition, there were different ad hoc solutions for teaching:

“ Some teachers were live on Zoom and had a PowerPoint; some just sent us the PowerPoint with no comments , and some sent us the PowerPoint with a recorded audio sound . It was not very good quality”(S2) .

Some university/college teachers said it could even be more time-consuming to give lectures digitally, since they felt that they could not manage the lectures alone. They felt they had to be two teachers present when teaching to manage both the technically and the communication with the students. This combination could make all types of digital meetings more time-consuming.

Most teachers and some clinical-field instructors found the new digital and technology platforms challenging regarding the quality of the pedagogical methods and the communication. The premises for communication, and hence the pedagogical quality, seemed limited by how social interactions were formed by the digital arena:

“What I don’t get (with the students) is the pedagogical dialogue in a way , being together and discussing , ”(T1) a teacher said.

For the clinical-field instructors, it was difficult to find time to meet with the teachers and the students. Clinical practice was busier than usual; and sometimes the reflections and informal dimensions of the evaluation meetings were omitted, although priority was given to the assessment dimension.

Some teachers were concerned that the variation, in pedagogical methods, was too narrow and thought it was difficult to use their familiar methods on the digital platform. Students, on the other hand, commented on the lack of a structured plan for digital meetings and lecturers. Lack of an adjusted pedagogy seemed to lower students’ learning.

For the digital preparations and reflections on clinical placements, one teacher advocated that it was easier for students to ask questions during digital lectures and group meetings. One teacher said:

“We lost some of the reflection in the groups that used to be quite open to each other . Because this is the students time to reflect over (their practice)” (T3) .

The students, however, felt that posing their reflections or asking questions had become more challenging. They increasingly doubted that their thoughts or reflections were important enough. They felt that the spotlight and attention put on them when speaking was challenging, and they felt that they put themselves more “out there.” It was challenging to find the confidence to speak up and express oneself. The students felt it was difficult to motivate and discipline themselves to pay attention during long digital meetings or lectures, especially meetings with low structure and few pauses. It was tiresome to have the camera switched on, but it was too easy to lose concentration when the camera was off.

Concerns about the digital follow-up of low-performing students.

The teachers and the clinical-field instructors shared extra concern for follow-ups with low-performing students.

“If there are challenges concerning a student , if you are uncertain if the student is in risk of failing , it would have been much easier to start that conversation face to face” (I19) .

They felt it was difficult to handle the process of students at risk of failing the clinical placement. This pertained to the clinical-field instructors, both to the process of informing the students and to not knowing if the teachers shared their opinions. Before, they would have mentioned this informally before meeting with the student; however, with digital meetings only, this had to be discussed and addressed to the student without warning. With a fragile relationship between clinical-field instructors and teachers, this task was demanding for both teachers and students to perform in a coordinated and emphatic way.

“It is unfortunate if the teacher claims that there is no risk of the student failing , but he/she doesn’t perceive the atmosphere in the room” (I5) .

Unclear responsibilities regarding digital meetings during clinical placements.

The educational and healthcare institutions did not have a common platform they both could use and master, resulting in modifications between many platforms and hybrid solutions. This meant that technical support from the institutions was low, and the digital platforms used within hospitals, such as the telehealth platforms designed for digital communication with patients, were not used at all. The firewalls with hospital computers made it impossible to communicate and thus blocked access from the digital platforms, such as Zoom , and Teams at educational institutions. Usually, the students were given (or took) the responsibility to use their own private devices to communicate with the teacher at the school and invited the teacher to the evaluation meeting. While most clinical-field instructors did not mind, one instructor thought that this gave her less influence in the meetings.

Digital alienation

Unnatural and awkward digital meetings..

The research participants compared the digital meetings with the familiar non-digital meetings and concluded that they missed important social information. One teacher said,

“When the camera is on , we kind of see each other’s body language . I see if you look interested in what I’m saying , for example , by leaning forward . If they are in a zoom meeting with the camera switched off , with no audio , there is no nonverbal communication” (S8) .

However, it seemed like something was missing when the camera was on:

“There is a difference between having the teacher physically visiting or just meeting on Skype or Teams” (S2) .

Some teachers stated that digital interactions created distance between the students and the clinical placement sites. This could be related to missing eye contact, other nonverbal signs, and gestures, and knowing which person in the meeting the nonverbal communication was directed to. Many students felt less connected with their co-students and their teachers. They were afraid that their message would be misinterpreted and hence misunderstood. Likewise, teachers were afraid that recording their digital lectures stored on the students’ learning platforms would contain errors. This was exhausting and sometimes resulted in shorter meetings since everybody wished that the meetings were over quickly. However, the new situation also created a feeling of fellowship, as mentioned by both teachers and students. One teacher said,

“We’re in this together” (T9) .

The feeling of a “COVID crisis” and not having any other choice than to meet digitally made the groups more understanding and tolerant to technical problems. They also showed understanding when their fellow students or co-workers used telephone or Zoom without a camera on.

Favoring physical meetings over digital ones, especially to establish relationships.

Almost every participant favored in-person over digital meetings, especially when the objective was to establish a relationship whether they were meeting for the first time. Using the digital platform only, all the informants felt it was difficult to get to know one another; it was difficult to express insecurities, difficulties, and worries. The teachers and the clinical-field instructors highlighted this:

“On Zoom , one loses the informal part of supervising together” (I6) .

To compensate for this, many teachers stressed the importance of taking time for small talk during the first meeting and practicing verbalization of ideas that would otherwise be expressed non-verbally.

Feeling lonely and isolated.

All participant groups, to some extent, felt more alone and isolated when navigating the new COVID-19 digital context. One student put it like this:

“When most students switch off their cameras and microphone , everything is quiet , and I wonder if anyone is actually listening to what I’m saying . You’re kind of just speaking to yourself” (S3) .

Teachers commented that this also affected their everyday working environment, and they missed informal meetings in the hallway, for example, seeking advice on how to invite a digital meeting and learning how to share their screen. They lost their collegial support as described by this teacher:

“It affects the work environment just to be in separate offices , but when we also are in our separate houses , it is a whole different matter” (T1) .

Students missed their co-students and informal chats during lunchtime and in the reading room. Of all the groups, the clinical-field instructors felt less lonely. However, they reported that they felt alone about supervising the students when the teachers did not attend the meeting.

Educational institutions faced several challenges during spring 2020, when COVID-19 caused sudden changes and strict restrictions on clinical placement. This study identified two main themes describing nursing students’, clinical-field instructors’, and university/college teachers’ experiences with the ad hoc digitalization caused by COVID-19 with emphasis on digital communication and education: 1) Saving time and increasing efficiency but losing reflection and pedagogical quality and 2) digital alienation and lack of real-life socialization. The results shows how the ad- hoc digital transformation maintained and amplified the pre- pandemic concerns of quality in nursing education, in particular how to meet the students’ educational needs from all income levels [ 16 ]. Sudden changes in education seemed to affect the students’ achievement negatively, especially among low-income and low-achieving students [ 17 ]. The results from this study supports earlier research identifying how students were dissatisfied with online learning in general during the pandemic [ 18 ] but adds to a subtle understanding of how it was experienced.

Difficulties in achieving and assessing learning outcomes

A concern raised by the teachers was that some nurses educated during the pandemic could not acquire the same level of skills as previously educated nurses. Similarly, Ulenaers, Grosemans [ 19 ] identified how some students experienced less advanced learning situations due to changes in clinical placement sites. Some hospital units stopped all planned hospital attendances other than emergency visits, resulting in fewer patients and less diversity in learning opportunities for some of the students. On the opposite side, in busy COVID-19 units, students were not included in relevant learning situations because the learning environment was too busy. Nursing students’ clinical learning is influenced by the nurses’ workload and the job intensity at their clinical placement [ 20 ].The lack of learning opportunities and unsatisfactory skill acquisition were also identified by the students during clinical placement in a study by Kaveh, Charati [ 21 ]. The lack of relevant learning situations for some students was worrisome, specifically because of the importance of the clinical training in nursing education [ 22 ].

Key results from this study suggest shared concern from both teachers and clinical-field instructors about losing students reflections because of the digital transition with bad internet connections or low digital competence. These challenges potentially also made students absent from lectures and discussions. Low digital competence was worrisome, especially since digital competence was a vital part of digital literacy. Educators’ competencies should also be enhanced [ 23 ] and should include strengthening the teachers’ abilities to plan and to conduct ICT-supported teaching and apply current research in supporting students’ learning [ 4 ]. Digital literacy is becoming an increasingly important component of clinical practice and nursing education since it is vital in accessing evidence for use in clinical practice [ 23 ].

Challenges in assessing students

Evaluating the students’ learning process during clinical placement became more difficult when the teacher was not physically present. The teachers and clinical-field instructors were especially concerned with low-performing students. Similar concerns have been identified in earlier studies. Black, Curzio [ 24 ] pointed to how having to fail an underperforming student can be morally distressing for clinical-field instructors. This process can be even more difficult without support from the teacher at the school. A qualitative study by Hunt et al. [ 25 ] showed that clinical-field instructors needed support to feel confident enough to fail underperforming students. Furthermore, learning and evaluating practical skills in nursing education depends on triangulation between the student, instructor, and teacher [ 26 , 27 ]. When the assessment is digital and the instructor can feel low social support, this can lead to higher thresholds for failing students and less attention being paid to the assessment of the students’ skills. Especially if field instructors are unsure about their own qualifications which Broadbent, Moxham (9) identified in their study. University involvement and close follow up of and with clinical-field instructors seems to be important to obtain a thorough and formative evaluation of the students’ performance during the clinical placement.

Social alienation

All participants felt that the digital platforms were difficult to use and that it was an unnatural setting for reflection and learning. In addition, both the teachers and the students felt alone and isolated. This is in line with earlier studies identifying how loneliness among students was high during the coronavirus pandemic [ 28 ]. Also Brennan [ 29 ] who identified how teachers in higher education felt isolated and disconnected due to student reluctance to use webcams and the techno-overload due to the changes in teaching format, from face-to-face to online. This is often referred to as technostress [ 30 ]. In addition, Luchetti, Lee [ 31 ] pointed to the effects of the pandemic’s stay-at-home orders had on general psychological well-being. Our results showed how social alienation experienced because of the digitalization of clinical placements, aggravated the situation. This can be especially challenging for nursing students’ since the emotional burdens influence their ability to learn during clinical placements [ 20 ] and are worsened by lack of structure and predictability [ 32 ] This adds to the difficulties created by less adequate places for clinical placements and highlights how difficult the situation was.

The teachers and the clinical-field instructors’ difficulties in assessing the students, the social alienation experienced by both the teachers and the students; and the sometimes ill-prepared, ad hoc lectures and meetings, resulted in a fluctuating pedagogical quality during the start of the COVID-19 pandemic. We questioned if the quality assurance concerning students’ skill levels and evaluations during their clinical placements, may have resulted in a risk of lower-performing newly graduated nurses compared with previously graduates.

Study limitations

A strength of this study is the multiple perspectives provided by students, clinical-field instructors, and university/college teachers. They revealed that the ad hoc digital transition across different nursing disciplines caused changes and quality challenges during clinical placements. Another strength of this study is the variation in sampling at the organizational and participant levels. The participants came from three educational sites with different sizes and three different degrees in nursing. To enhance the reflexivity with the study, the analysis was conducted by the research team, who based on their different academic standings and institutions from which they came, provided a range of perspectives to the data interpretation.

Most teachers agreed to participate in this study, but a higher number of the students and clinical field instructors did not respond or declined. Consequently, only a small number of students and clinical field instructors participated in this study, which is a limitation. The results must therefore be interpreted with caution. The teachers and clinical-field instructors were included regardless of pedagogical training and/or pedagogical background.

All interviews were performed digitally; therefore, our digitally hosted conversations with the participants may have the same problems as those described by the participants in this study, namely, a potentially awkward interview situation.

The COVID-19 pandemic resulted in ad hoc solutions for clinical placements, assessments, and evaluations. Some students and teachers experienced a challenging learning environment. Because of this, the teachers and clinical-field instructors were concerned about the consequences such as potential for lower-performing students in this group compared to the previous graduated nursing students. In this study we explored one aspect of the digital transformation, the role of online assessments and evaluations in clinical placement. In line with our results, more knowledge about the student-instructor relationship, pedagogy, the role of the instructor, as well as the role of the student, are needed. Whether this potential disadvantage experienced by these students during their clinical placements will need to be rectified later in their education, needs to be further explored. The results from this study have the potential to inform nursing education about how to prepare for and to manage use of ICT in nursing education in the future.

Supporting information

S1 appendix. interview guide..

https://doi.org/10.1371/journal.pone.0287438.s001

Acknowledgments

The authors thank all participants for their time and engagement.

  • View Article
  • Google Scholar
  • PubMed/NCBI
  • 12. Polit DF, Beck CT. Nursing research: Generating and assessing evidence for nursing practice: Lippincott Williams & Wilkins; 2008.
  • 14. European Credit Transfer Accumulation System.
  • 26. Schoener L. The relationships among observational learning through clinical teachers’ role modeling, self-directed learning, and competence as perceived by generic baccalaureate nursing students: Widener University School of Nursing; 2001.

On-line estimators for ad-hoc task execution: learning types and parameters of teammates for effective teamwork

  • Open access
  • Published: 13 August 2022
  • Volume 36 , article number  45 , ( 2022 )

Cite this article

You have full access to this open access article

ad hoc journal article review

  • Elnaz Shafipour Yourdshahi 1 ,
  • Matheus Aparecido do Carmo Alves 2 ,
  • Amokh Varma 3 ,
  • Leandro Soriano Marcolino   ORCID: orcid.org/0000-0002-3337-8611 2 ,
  • Jó Ueyama 4 &
  • Plamen Angelov 2  

2565 Accesses

2 Altmetric

Explore all metrics

It is essential for agents to work together with others to accomplish common objectives, without pre-programmed coordination rules or previous knowledge of the current teammates, a challenge known as ad-hoc teamwork. In these systems, an agent estimates the algorithm of others in an on-line manner in order to decide its own actions for effective teamwork. A common approach is to assume a set of possible types and parameters for teammates, reducing the problem into estimating parameters and calculating distributions over types. Meanwhile, agents often must coordinate in a decentralised fashion to complete tasks that are displaced in an environment (e.g., in foraging, de-mining, rescue or fire control), where each member autonomously chooses which task to perform. By harnessing this knowledge, better estimation techniques can be developed. Hence, we present On-line Estimators for Ad-hoc Task Execution (OEATE), a novel algorithm for teammates’ type and parameter estimation in decentralised task execution. We show theoretically that our algorithm can converge to perfect estimations, under some assumptions, as the number of tasks increases. Additionally, we run experiments for a diverse configuration set in the level-based foraging domain over full and partial observability, and in a “capture the prey” game. We obtain a lower error in parameter and type estimation than previous approaches and better performance in the number of completed tasks for some cases. In fact, we evaluate a variety of scenarios via the increasing number of agents, scenario sizes, number of items, and number of types, showing that we can overcome previous works in most cases considering the estimation process, besides robustness to an increasing number of types and even to an erroneous set of potential types.

Similar content being viewed by others

ad hoc journal article review

Task-Based Ad-hoc Teamwork with Adversary

ad hoc journal article review

Helping People on the Fly: Ad Hoc Teamwork for Human-Robot Teams

Ad hoc teamwork by learning teammates’ task, explore related subjects.

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

1 Introduction

Autonomous agents are usually designed to pursue a specific strategy and accomplish a single or set of tasks. Intending to improve their performance, these agents often follow specified coordination and communication protocols to enable the collection of valuable information from the environment components or even from other reliable agents. However, employing these methods is challenging due to environmental and technological constraints. There are circumstances where communication channels are unreliable, and agents cannot fully trust them to send or receive information. Moreover, particular situations require the design of agents (e.g., robots or autonomous systems) from various parties aiming to solve a problem urgently, but constructing and testing communication and coordination protocols for all different agents can be unfeasible given the time constraints. For example, consider a natural disaster or a hazardous situation where institutions may urgently ship robots from different parts of the world for handling the problem. In these scenarios, avoiding delays and unnecessary funding usage would save lives and mitigate the caused damages

One possible solution is to offer a centralised mechanism to allocate tasks to each agent in the environment in an efficient manner. However, we may face scenarios where there is no centralised mechanism available to manage the agents’ actions. When we consider large scale problems, it is even easier to imagine situations where environmental or time constraints also derail this solution. Hence, agents need to decide, autonomously, which task to pursue [ 11 ]—defining what we will denominate a decentralised execution scenario. The decentralised execution is quite natural in ad-hoc teamwork, as we cannot assume that other agents would be programmed to follow a centralised controller. Therefore, allowing agents to reason about the surrounding environment and create partnerships with other agents can support the accomplishment of missions that are hard to deal with individually, reducing the necessary time to achieve all tasks and minimising the costs related to the process.

For many relevant domains, these decentralised execution problems can be modelled focused on the set of tasks that need to be accomplished in a distributed fashion (e.g., victims to be rescued from a hazard, letters to be quickly delivered to different locations, etc). Note this kind of design presents a task-based perspective to solve the problem, where agents must reason about their teammates’ targets to improve the coordination, hence the team’s performance. In this way, the agents must approximate the teammates’ behaviours (or their main features) in order to deliver this improvement while solving the problem.

As our first goal, this paper will address the problem where agents are supposed to complete several tasks cooperatively in an environment where there is no prior information, reliable communication channel or standard coordination protocol to support the problem completion. We will denominate this ad-hoc team situation as a Task-based Ad-hoc Teamwork problem, a decentralised distributed system where agents decide their tasks autonomously, without previous knowledge of each other, in an environment full of uncertainties.

Instead of developing algorithms that are able to learn any possible policy from scratch, a common approach in the ad-hoc teamwork literature is to consider a set of possible agent types and parameters, thereby reducing the problem of estimating those [ 2 , 3 , 10 ]. This approach is more applicable, as it does not require a large number of observations, thus allows the learning and acting to happen simultaneously in an on-line fashion, i.e., in a single execution. Types could be built based on previous experiences [ 7 , 8 ] or derived from the domain [ 1 ]. Moreover, the introduction of parameters for each type allows more fine-grained models [ 2 ]. However, the previous works that learn types and parameters in ad-hoc teamwork are not specifically designed for decentralised task execution, missing an opportunity to obtain better performances in this relevant scenario for multi-agent collaboration.

Other lines of work focus on neural network-based models and learn the policies of other agents after thousands (even millions) of observations [ 22 , 33 ]. These applications, however, would be costly, especially when domains get larger and more complicated. Similarly, I-POMDP based models [ 12 , 17 , 19 , 23 ] could be applied for reasoning about the model of other agents from scratch, but its application is non-trivial considering larger problems.

On the other hand, some approaches in the literature have also tested task-based designs, inferring about agents pursuing tasks to predict their behaviour [ 13 ]. Although we share some similarities, they have not yet handled learning types and parameters of agents in ad-hoc teamwork systems where multiple agents may need to cooperate to complete common tasks.

Therefore, as our main contribution, we present in this paper On-line Estimators for Ad-hoc Task Execution (OEATE), a novel algorithm for estimating teammates types and parameters in decentralised task execution. Our algorithm is light-weighted, running estimations from scratch at every single run, instead of employing pre-trained models, or carrying knowledge between executions. Under some assumptions, we show theoretically that our algorithm converges to a perfect estimation when the number of tasks to be performed gets larger. Additionally, we run experiments for two collaborative domains: (ii) a level-based foraging domain, where agents collaborate to collect “heavy” boxes together, and; (ii) a capture the prey domain, where agents must collaborate to surround preys and capture them. We also tested the performance of our method in full and partial observable scenarios. We show that we can obtain a lower error in parameter and type estimations in comparison with the state-of-the-art, leading to significantly better performance in task execution for some of the studied cases. We also run a range of different scenarios, considering situations where the number of agents, scenario sizes, and the number of items gets larger. Furthermore, we evaluate the impact of increasing the number of possible types. Finally, we run experiments where our ad-hoc agent does not have the true type of the other agents in its pool of possible agent types. In such challenging situations, our parameter estimation outstands the competitors and, our type estimation and performance is similar or better than the state-of-the-art in several cases considering the results’ confidence interval.

2 Background

Ad-hoc Teamwork Model Ad-hoc teamwork defines domains where agents intend to cooperate with their teammates and coordinate their actions to reach common goals. Moreover, agents in the domains do not have any prior communication or coordination protocols to enable the exchange of information between them, so learning and reasoning about the current context are mandatory to improve the team’s performance as a unit. However, if agents are aware of some potential pre-existing standards for coordination and communication, they can try to learn about their teammates with limited information [ 8 ]. As a result of such intelligent coordination in the ad-hoc teams, they can improve their decision-making process and hence, accomplish shared goals more efficiently.

This fundamental model can be extended to fit many problems and scenarios. For our work, we will extend it to a task-based model, enabling a better representation of our world as presented in previous state-of-the-art works [ 5 , 6 , 41 ].

Task-based Ad-hoc Teamwork Model As an extension of ad-hoc teamwork model, the task-based ad-hoc teamwork model represents a problem where one learning agent \(\phi\) , acts in the same environment as a set of non-learning agents \(\omega \in \varvec{\Omega }\) , \(\phi \notin \varvec{\Omega }\) . In the ad-hoc team \(\phi \cup \varvec{\Omega }\) , the objective of \(\phi\) (as the learning agent) is to maximise the performance (e.g., the number of tasks accomplished or the necessary time to finish them all). However, all non-learning agents’ models are unknown to \(\phi\) , and there is no communication channel available. Hence, \(\phi\) must estimate and understand their models as time progresses, by observing the scenario. In other words, the learning agent must improve its decision-making process by approximating the teammates’ behaviour in an on-line manner and facing a lack of information.

Besides, there is a set of tasks \(\mathbf {T}\) which all agents in the team endeavour to accomplish autonomously. A task \(\tau \in \mathbf {T}\) may require multiple agents to perform it successfully and multiple time steps to be completed. For instance, in a foraging problem, a heavy item may require two or more robots to be collected, and the robots would need to move towards the task location to accomplish it, taking multiple time steps to move from their initial position.

The learning agent \(\phi\) must minimise the time to accomplish all tasks. Hence, playing this role requires the support of a method that integrates the estimation and the decision-making process while performing and improving the planning.

Model of Non-Learning Agents All non-learning agents aim to finish the tasks in the environment autonomously. However, choosing and completing a task \(\tau\) by any \(\omega\) is dependent on its internal algorithm and its capabilities. Nonetheless, \(\omega\) ’s algorithm can be one of the potential algorithms defined in the system, which might be learned from previous interactions with other agents [ 7 ].

Therefore, following the model of Non-Learning agents defined in previous works [ 2 , 41 ], there is a set of potential algorithms in the system, which compose a set of possible types \(\varvec{\Theta }\) for all \(\omega \in \varvec{\Omega }\) . The assumption is that all these algorithms have some inputs, which is denominated parameters . Hence, the types are all parameterised , which affects agents’ behaviour and actions. Considering the existence of these types’ parameters allows \(\phi\) to use more fine-grained models when handling new unknown agents.

According to these assumptions, each \(\omega \in \varvec{\Omega }\) will be represented by a tuple ( \(\theta\) , \(\mathbf {p}\) ), where \(\theta \in \varvec{\Theta }\) is \(\omega\) ’s type and \(\mathbf {p}\) represents its parameters, which is a vector \(\mathbf {p} = <p_1,p_2,\ldots ,p_n>\) . Also, each element \(p_i\) in the vector \(\mathbf {p}\) is defined in a fixed range [ \(p_i^{min}\) , \(p_i^{max}\) ] [ 2 ]. So, the whole parameter space can be represented as \(\mathbf {P} \subset \mathbb {R}^{n}\) . These parameters can be the abilities and skills of an agent. For instance, a robot can be quite different depending on its hardware—for a robot, it can be vision radius, the maximum battery level or the maximum velocity. The parameters could also be hyper-parameters of the algorithm itself. Consequently, each \(\omega \in \varvec{\Omega }\) , based on its type \(\theta\) and parameters \(\mathbf {p}\) , will choose a target task. The process of choosing a new task can happen at any time and any state in the state space, depending on the agents’ parameters and type. We denominate these decision states as Choose Target States \(\mathfrak {s} \in S\) .

In the Task-based Ad-hoc Teamwork context, a precise estimation of tasks also depends on estimating the Choose Target State . Our method presents a solution to this problem by considering an information-based perspective, which does its evaluation by giving different weights to the information derived from observations made by the agent \(\phi\) , instead of directly estimating the choose target state. More detail will be presented in Sect. 6 .

Stochastic Bayesian Game Model A Stochastic Bayesian Game (SBG) describes a well-suited solution towards the representation of ad-hoc teamwork problems that combine the Bayesian games with the concept of stochastic games and provide a descriptive model to the context [ 4 , 29 ]. In this section, we will define an SBG-based model for our specific setting and refer the reader to [ 29 ] for a more generic formulation.

Our model consists of a discrete state space S , a set of players ( \(\phi \cup \varvec{\Omega }\) ), a state transition function \(\mathcal {T}\) and a type distribution \(\Delta\) . Each agent \(\omega \in \varvec{\Omega }\) has a type \(\theta _i \in \varvec{\Theta }\) and a parameter space \(\mathbf {P}\) . Each parameter is a vector \(\mathbf {p} = <p_1,p_2 \ldots p_n>\) and each \(p_i \in [p_i^{min},p_i^{max}]\) , for all agents. The set \([p_1^{min},p_1^{max}] \times \dots \times [p_n^{min},p_n^{max}] = \mathbf {P} \subset \mathbb {R}^n\) is the parameter space for each agent. Each type could have a different parameter space, but we define a single parameter space here for simplicity of notation. Furthermore, we assume that the types of the agents are fixed throughout the process (a pure and static type distribution). Moreover, each player is associated with a set of actions, an individual payoff function and a strategy. Considering that at each time step, agents \(\omega _i \in \varvec{\Omega }\) are fixed tuples \((\theta _i, \mathbf {p}_i)\) , where \(\theta _i \in \varvec{\Theta }\) and \(\mathbf {p}_i \in \mathbf {P}\) , we extend the SBG model in order to describe the following problem:

Problem Consider a set of players \(\phi \cup \varvec{\Omega }\) that share the same environment. Each player acts according to its type \(\theta _i\) , set of parameters \(\mathbf {p}_i\) and own strategy \(\pi _i\) . They do not know the others’ types or parameters. At each time step t , given the state \(s^t\) and a joint action \(a^t = (a^t_\phi ,a^t_1,a^t_2,\ldots ,a^t_{|\varvec{\Omega }|})\) , the game transitions accordingly to the transition probability \(\mathcal {T}\) and each player receives an individual payoff \(r_i\) until the game reaches a terminal state.

Therefore, by using the SBG model, we can represent our problem and the necessary components in it. However, we consider in this work a fully cooperative problem, under the point of view of agent \(\phi\) . Hence, within the task-based ad-hoc teamwork context, we want to model the problem employing a single-player abstraction under \(\phi\) ’s point of view. Using a Markov Decision Process Model (MDP), we can abstract all the environment components as part of the state (including teammates in \(\varvec{\Omega }\) ). This approach enables the aggregation of individual rewards from the SBG model into a single global reward and allows us to use single-player Monte Carlo Tree Search techniques, as previous works did [ 5 , 32 , 41 ].

Markov Decision Process Model The Markov Decision Process (MDP) consists of a mathematical framework to model stochastic processes in a discrete time flow. As mentioned, although there are multiple agents and perspectives in the team, we will define the model considering the point of view of an agent \(\phi\) and apply a single agent MDP model , as in previous works [ 5 , 32 , 41 ] that represent other agents as part of the environment.

Therefore, we consider a set of states \(s \in \mathcal {S}\) , a set of actions \(a \in \mathcal {A}_\phi\) , a reward function \(\mathcal {R}(s,a)\) , and a transition function \(\mathcal {T}\) , where the actions in the model are only the \(\phi\) ’s actions. In other words, \(\phi\) can only decide its own actions and has no control over other environment components (e.g., actions of agents in the set \(\varvec{\Omega }\) ). All \(\omega\) in \(\varvec{\Omega }\) are modelled as the environment , as their actions indirectly affect the next state and the obtained reward. Therefore, they are abstracted in the transition function. That is, in the actual problem, the next state depends on the actions of all agents, however, \(\phi\) is unsure about the non-learning agents next action. For this reason, we consider that given a state s , an agent \(\omega \in \varvec{\Omega }\) has a (unknown) probability distribution (pdf) across a set of actions \(\mathcal {A}_{\omega }\) , which is given by \(\omega\) ’s internal algorithm ( \(\theta\) , \(\mathbf {p}\) ). This pdf is going to affect the probability of the next state. Therefore, we can say that the uncertainty in the MDP model comes from the randomness of the actions of the \(\omega\) agents, besides any additional stochasticity of the environment.

This model allows us to employ single-agent on-line planning techniques, like UCT Monte Carlo Tree Search [ 26 ]. In the tree search process, the pdf of each agent defines the transition function. At each node transition, \(\phi\) samples \(\omega\) agents’ actions from their (estimated) pdfs, and that will determine the next state \(s'\) for the next node. However, in traditional UCT Monte Carlo Tree Search, the search tree increases exponentially with the number of agents. Hence, we use a history-based version of UCT Monte Carlo Tree Search called UCT-H , which employs a more compact representation than the original algorithm, and helps to trace the tree in larger teams in a simpler and faster fashion [ 41 ].

As mentioned earlier, in this task-based ad-hoc team, \(\phi\) attempts to help the team to get the highest possible reward. For this reason, \(\phi\) needs to find the optimal value function, which maximises the expected sum of discounted rewards \(E[\sum _{j=0}^{\infty }\gamma ^jr_{t+j}]\) , where t is the current time, \(r_{t+j}\) is the reward \(\phi\) receives at j steps in the future, \(\gamma \in (0, 1]\) is a discount factor. Also, we consider that we obtain the rewards by solving the tasks \(\tau \in \mathbf {T}\) . That is, we define \(\phi\) ’s reward as \(\sum r(\tau )\) , where \(r(\tau )\) is the reward obtained after the task \(\tau\) completion. Note that the sum of rewards is not only across the tasks completed by \(\phi\) , but all tasks completed by any set of agents in a given state. Furthermore, there might be some tasks in the system that cannot be completed without cooperation between the agents, so the number of required agents for finishing a task \(\tau\) depends on each specific task and the set of agents that are jointly trying to complete it.

Note that the agents’ types and parameters are actually not observable, but in our MDP model that is not directly considered also. Estimated types and parameters are used during on-line planning, creating an estimated transition function. The actual decisions made by the non-learning agents is observable in the real world transitions without any direct information about type and parameters. More details are available in the next section.

3 Related works

The literature introduces ad-hoc teamwork as a remarkable approach to handle multi-agents systems [ 5 , 38 ]. This approach presents the opportunity to achieve the objectives of the multiple agents in a collaborative manner that surpasses the requirement of designing a communication channel for information exchanging between the agents, building an application to do prior coordination or the collection of previous data that train agents intending to improve the decision-making process within the environment. Furthermore, these models enable the creation of algorithms capable of acting in an on-line fashion, dynamically adapting their behaviour according to the environment and current teammates.

In this section, we will carry out a comprehensive discussion about the state-of-the-art contributions and how these different approaches have inspired our work. Intending to facilitate understanding and readability, we organised the section into topics and related contributions by groups. Each subsection categorises the major idea of each group and summarises the main strategy of those.

3.1 Type-based parameter estimation

Considering type-based reasoning and parameters learning , we can solve the problem using fine-grained models, which evaluate the observations and estimate each agent’s type and parameters in an on-line manner [ 1 , 3 , 7 , 8 , 10 ]. These lines of works propose the approximation of agents’ behaviour to a set of potential types to improve the ad-hoc agents’ decision-making capabilities, allowing a quick on-line estimation of agents’ algorithms, without requiring an expensive training process for learning their policies from scratch. However, if a set of potential types and the parameter space cannot be defined through domain knowledge, then they would have to be learned from previous interactions [ 8 ].

Albrecht and Stone [ 2 ], in particular, introduced the AGA and ABU algorithms for type-based reasoning of teammates parameters in an on-line manner, which are the main inspirations for this work. Both methods sample sets of parameters (from a defined parameter space) to perform estimations via gradient ascent and Bayesian updates, respectively. However, by focusing on decentralised task execution in ad-hoc teams, our novel method surpasses their parameter and type estimations when the number of teammates gets larger or more tasks are accomplished, consequently leading to better team performance. We also extend their work by adding partial observability to all team members.

On the other hand, Hayashi et al. [ 22 ] propose an enhanced particle reinvigorating process that leverages prior experiences encoded in a recurrent neural network (RNN), acting into a partial observable scenario in their ad-hoc team. However, they need thousands of previous experiences for training the RNN, while still requiring knowledge of the potential types. Our approach can start from scratch at every single run, with no pre-training.

Concerning problems with partial observability, POMCP is usually employed for on-line planning [ 37 ]. However, it is originally designed for a discrete state space, making it harder to apply POMCP for (continuous) parameter estimation. However, we apply POMCP in combination with our algorithm OEATE, which enables the decision making on partial observable scenarios and improves the POMCP search space, given the OEATE’s estimation of the agents’ parameters. We also evaluate experimentally the performance of POMCP for our problem without the embedding of parameter estimation algorithms.

3.2 Complex models

Guez et al. [ 20 ] proposed a Bayesian MCTS that tries to directly learn a transition function by sampling different potential MDP models and evaluating it while planning under uncertainty. Our planning approach (inspired by [ 2 , 7 ]) is similar, as we sample different agent models from our estimations. However, instead of directly working on the complex transition function space, we learn agents types and parameters, which would then translate to a certain transition probability for the current state or belief state.

Rabinowitz et al. [ 33 ] introduce a “Machine Theory of Mind”—or purely the Theory of Mind (ToM) approach—, where neural networks are trained in general populations to learn agent types, and the current agent behaviour is then estimated in an on-line manner. Similarly to learning policies from scratch, however, their general models require thousands (even millions) of observations to be trained. Besides, they used a small \(11 \times 11\) grid in their experiments, while we scale all the way to \(45 \times 45\) to estimate the behaviour of several unknown and distinct teammates. On the other hand, if a set of potential types is not given by domain knowledge, then their work serves as another example that types could be learned.

A different approach that enables the learning of teammates models and reasoning about their behaviour in planning is given by I-POMDP based models [ 12 , 17 , 19 , 23 ]. However, they are computationally expensive, assuming all agents are learning about others recursively and considering agents that receive individual rewards (processing estimations individually).

Eck et al. [ 18 ] addressed this problem and recently proposed a scalable approach using the I-POMDP-Lite Framework in order to consider large open agent systems. In their approach, an agent considers a large population by modelling a representative set of neighbours. They focus on estimating how many agents perform a particular action, hence their approach is not applicable to the task-based problems that we consider in this work. Additionally, although they present a scalable approach in terms of team size, they still consider only small \(3 \times 3\) scenarios. In this work, we show scalability regarding the team size, the dimensions of the map and the numbers of simultaneous tasks in the scenario.

Rahman et al. [ 34 ] also handle open agent problems and propose the application of a Graph Neural Network (GNN) for estimating agents behaviours. Similarly to other neural network-based models, it needs a large amount of training, and their results are limited to a \(10 \times 10\) grid world with 5 agents. Their agent parametrisation is also more limited, with only 3 possible levels in the level-based foraging domain, which is directly given as input for each agent (instead of learned).

Therefore, we propose lighter MDP/POMDP models, focused on decentralised task execution, with a single team reward, that allows us to tackle problems with a larger number of agents, and tasks in bigger scenarios in the partially observable domain. Also, we build a model for every single member of the team. On the other hand, open agent systems are not in the scope of our work, and we consider fixed team sizes.

3.3 Task-oriented and task-allocation approaches

As mentioned, our key idea is to focus on decentralised task execution problems in ad-hoc teamwork. Chen et al. [ 13 ] present a related approach, where they focus on estimating tasks of teammates, instead of learning their model. While related, they focus on task inference in a model-free approach, considering that each task must be performed by one agent, and the ad-hoc agent goal changes to identifying tasks that are not yet allocated. Our work, on the other hand, combines task-based inference with model-based approaches and allows for tasks to require an arbitrary number of agents. Additionally, their experiments are on small \(10 \times 10\) grids, with a lower number of agents than us.

There are also other works attempt to identify the task being executed by a team from a set of potential tasks [ 29 ]; or an agent’s strategy for solving a repetitive task, enabling the learner to perform collaborative actions [ 39 ]. Our work, however, is fundamentally different, since we focus on a set of (known) tasks which must be completed by the team.

Another approach suggested in the literature for task-based problems optimisation are the Multi-Agent Markov Decision Problem (MMDP) models [ 14 , 15 ]. These models allow agents to decide their target task autonomously and are focused on estimating teammates’ policies directly at specific times in the problem execution. Given knowledge of the MMDP model, those approaches compute the best response policy (at the current time) for the other agents and use those models while planning. However, they do not consider learning a probability distribution over potential types and estimating agents’ parameters like in our approach. OEATE  is capable of using a set of potential types and space of parameters to learn the probabilities of each type-parameter set up for each teammate in an on-line fashion.

Multi-Robot Task Allocation (MRTA) models also represent an alternative approach to solve problems in the ad-hoc teamwork context [ 27 , 40 ]. Intending to maximise the collective completion of tasks, these models employ decentralised task execution strategies that work in an on-line manner without a central learning agent. Each agent develops its own strategy based on the received observations. Similarly to our proposal, MRTA models implement a task-based perspective to deliver solutions where agents know and seek tasks distributed in an environment while reasoning. However, MRTA models assume knowledge about the teammates’ types and the tasks that they are pursuing. Furthermore, this assumption holds because they consider this information is available in the environment, where agents can get it through observation (e.g., agents choosing tasks of different colours) or reliable communication channels for information exchange between the agents. As we mentioned earlier, there are circumstances where communication channels are unreliable, and agents cannot fully trust them to send or receive information. OEATE  predicts their teammates’ targets while learning their types and parameters, besides handling problems where these assumptions are not secured.

Concerning task allocation, MDP-based models are commonly applied [ 30 , 31 ] in the ad-hoc teamwork context. For instance, it can be framed as a multi-agent team decision problem [ 35 ], where a global planner calculates local policies for each agent. Auction-based approaches are also common, assigning tasks based on bids received from each agent [ 28 ]. These approaches, however, require pre-programmed coordination strategies, while we employ on-line learning and planning for ad-hoc teamwork in decentralised task execution, enabling agents to choose their tasks without relying on previous knowledge of the other team members, and without requiring an allocation by centralised planners/controllers.

3.4 Genetic algorithms

OEATE  is inspired by Genetic Algorithms (GA) [ 24 ] since our main idea is to keep a set of estimators , generating new ones either randomly or using information from previously selected estimators . However, GAs evaluate all individuals simultaneously at each generation, and usually, they are selected to stay in the new population or for elimination according to its fitness function. Our estimators , on the other hand, are evaluated per agent at every task completion, and survive according to the success rate. The proportion of survived estimators are then used for type estimation, and new ones are generated using a similar approach to the usual GA mutation/crossover. Moreover, we choose the application of GA concepts in the works considering our empirical and theoretical results. As an empirical result, the employment of the GA approach showed better results in comparison with the Bayesian Updates (considering the performance of AGA and ABU against OEATE) As a theoretical result, our solution does not depend on finite-dimensional representations for parameter-action relationships and can provide a more robust way to explore the whole parameter space, through the use of multiple estimators, which mutate to form even better estimators.

3.5 Prior contributions

As one of our major prior contributions, we recently proposed an on-line learning and planning approach for an agent to make decisions in environments containing previously unknown swarms (Pelcner et al. [ 32 ]). Defined in a “capture the flag” domain, an agents must perform its learning procedure at every run (from scratch) to approximate a single model for a whole defensive swarm, while trying to invade their formation to capture the flag. Differently from Pelcner et al. [ 32 ], in this proposal we are aiming to learn a model for each agent in the environment and by the estimation of types and parameters.

Another important work related to this current contribution is the UCT-H proposal in Shafipour et al. [ 41 ]. Previous works that employ Monte Carlo Tree Search approaches are limited to a small search tree since the cost of this procedure increases exponentially with the number of agents and scenario. Trying to expand its horizons of applicability, we proposed a history-based version of UCT Monte Carlo Tree Search (UCT-H), using a more compact representation than the original algorithm. We performed several experiments with a varying number of agents in the level-based foraging domain. As OEATE is a Monte-Carlo based model, the studied of Monte Carlo Tree Search approaches and their capabilities were essential to the development of our novel algorithm. In this current work and to perform a fair comparison, we used the UCT-H version of the Monte-Carlo tree search to run every defined baseline.

4 Estimation problem

Considering the problem described by the MDP model in Sect. 2 , in this section, we describe the general workflow of an estimation process and discuss how we integrated planning and estimation in this work.

Estimations process Initially, since agent \(\phi\) does not have information about each agent \(\omega\) ’s true type \(\theta ^*\) and true parameters \(\mathbf {p}^*\) , it will not know how they may behave at each state, hence, must reason about all possibilities for type and parameters from distribution \(\Delta\) . So, \(\phi\) must consider, for each \(\omega \in \varvec{\Omega }\) , an uniform distribution for initialising the probability of having each type \(\theta \in \varvec{\Theta }\) , as well as randomly initialising each parameter in the parameter vector \(\mathbf {{p}}\) based on their corresponding value ranges. However, given some domain knowledge, it could be sampled from a different distribution both for types and for parameters.

After each estimation iteration, we expect that agent \(\phi\) will have a better estimation for type \(\theta\) and parameter \(\mathbf {p}\) of each non-learning agent in order to improve its decision-making and the team’s performance. Hence, \(\phi\) must learn a probability for each type, and for each type, it must present a corresponding estimated parameter vector.

In further steps, as agent \(\phi\) observes the behaviour of all \(\omega \in \varvec{\Omega }\) and notices their actions and the tasks that they accomplish, it keeps updating all the estimated parameter vectors \(\mathbf {p}\) , and the probability of each type \(\mathsf {P}(\theta )_\omega\) , based on the current state. The way these estimations are updated depends on which on-line learning algorithm is employed.

This described process aims to improve the quality of \(\phi\) ’s decision-making based on the quality of the result delivered by the estimation method. Therefore, we will perform experiments using three different methods from the literature for type and parameter estimation: Approximate Gradient Ascent (AGA), Approximate Bayesian Update (ABU) [ 2 ] and POMCP [ 37 ], which will be explained in more detail in further Sect. 5 . Moreover, these methods will represent our baselines for comparison against our novel algorithm, denominated On-line Estimators for Ad-hoc Task Execution (OEATE), for parameter and type estimation in decentralised task execution, which will be described in detail in Sect. 6 .

Planning and Estimations The current estimated models of the non-learning agents are used for on-line planning, allowing agent \(\phi\) to estimate its best actions. In this work, we employ UCT-H for agent \(\phi\) ’s decision-making. UCT-H is similar to UCT, but using a history-based compact representation. This modification was shown to be better in ad-hoc teamwork problems [ 41 ]. Therefore, as in previous works [ 2 , 41 ], we sample a type \(\theta \in \varvec{\Theta }\) for each non-learning agent from the estimated type probabilities each time we re-visit the root node during the tree search process. We use the newly estimated parameters \(\mathbf {p}\) for the corresponding agent and sampled type, which will impact the estimated transition function, as described in our MDP model. Consequently, the higher the quality of the type and parameter estimations, the better will be the result of the tree search process. As a result, agent \(\phi\) makes a decision concerning which action to take.

Note that the actual \(\omega\) agents may be using different algorithms than the ones available in our set of types \(\varvec{\Theta }\) . Nonetheless, agent \(\phi\) would still be able to estimate the best type \(\theta\) and parameters \(\mathbf {p}\) to approximate agent \(\omega\) ’s behaviour. Additionally, \(\omega\) agents may or may not run algorithms that explicitly model the problem as decentralised task execution or over a task-based perspective. However, using the single-agent MDP , we only need agent \(\phi\) to be able to model the problem as such.

5 Previous estimation methods and baselines

In this work, we compare our novel method against some state-of-the-art methods. We defined three algorithms from the literature as our baselines: AGA , ABU and POMCP . Therefore, we will review these methods in this section.

AGA and ABU Overall The Approximate Gradient Ascent (AGA), and the Approximate Bayesian Update (ABU) estimation methods are introduced in Albrecht and Stone [ 2 ]. In that work, the probability of taking the action \(a^t_\omega\) at time step t , for agent \(\omega\) , is defined as \(\mathsf {P}(a^t_\omega |H_\omega ^t, \theta _i, \mathbf {p})\) , where \(H_\omega ^t = (s^0_i , \ldots , s^t_i )\) is the \(\omega\) agent’s history of observations at time step t , \(\theta _i\) is a type in \(\Theta\) , and \(\mathbf {p}\) is the parameter vector which is estimated for type \(\theta _i\) . For the estimation methods, a function f is defined as \(f(\mathbf {p}) = \mathsf {P}(a^{t-1}_\omega |H_\omega ^{t-1} , \theta _i , \mathbf {p})\) where \(f(\mathbf {p})\) represents the probability of the agents’ previous action \(a^{t-1}_\omega\) , given the history of observations of agent \(\omega\) in previous time step, \(H_\omega ^{t-1}\) , type \(\theta _i\) , and its corresponding parameter vector \(\mathbf {p}\) . After estimating the parameter \(\mathbf {p}\) for agent \(\omega\) for the selected type \(\theta _i\) , the probability of having type \(\theta _i\) is updated following:

Iteratively, they showed that both methods are capable of approximate the type and parameters and improve the performance in the ad-hoc teamwork context.

AGA The main idea of this method is to update the estimated parameters of an agent \(\omega\) by following the gradient of a type’s action probabilities based on its parameter values. Algorithm 1 provides a summary of this method.

figure a

First of all, the method collects samples \((\mathbf {p}^{(l)}, f(\mathbf {p} ^ {(l)}))\) , and stores them in a set \(\mathbf {D}\) (Line 2). The method for collection could be, for example, using a uniform grid over the parameter space that includes the boundary points. After collecting a set of samples, the algorithm, in Line 3, fits a polynomial \(\hat{f}\) of some specified degree d according to the collected samples. By fitting \(\hat{f}\) , the gradient \(\nabla \hat{f}\) with some suitably chosen step size \(\lambda ^t\) is calculated in the next Line 4. At the end, in Line 5, the estimated parameter is updated as presented in Equation 2 .

These steps define the AGA algorithm to estimate the agent’s parameters and type iteratively. For further details, we recommend reading Albrecht and Stone [ 2 ].

ABU In this method, rather than using \(\hat{f}\) to perform gradient-based updates, Albrecht and Stone use \(\hat{f}\) to perform Bayesian updates that retain information from past updates. Hence, in addition to the belief \(\mathsf {P}(\theta _i |H^t_\omega )\) , agent \(\phi\) now also has a belief \(\mathsf {P}(\mathbf {p}|H^t_\omega , \theta _i )\) to quantify the relative likelihood of parameter values \(\mathbf {p}\) , for agent \(\omega\) , when considering type \(\theta _i\) . This new belief is represented as a polynomial of the same degree d as \(\hat{f}\) . Algorithm 2 provides a summary of the Approximate Bayesian Update method.

figure b

After fitting \(\hat{f}\) (Line 2), the polynomial convolution of \(\mathsf {P}(\mathbf {p}|H_\omega ^{t-1} , \theta _i )\) and \(\hat{f}\) results in a polynomial \(\hat{g}\) of degree greater than d (Line 3). Afterwards, in Line 4, a set of sample points is collected from the convolution \(\hat{g}\) in the same way that is done in Approximate Gradient Ascent, and a new polynomial \(\hat{h}\) of degree d is fitted to the collected set in Line 5. Finally, the integral of \(\hat{h}\) under the parameter space, and the division of \(\hat{h}\) by the integral is calculated, to obtain the new belief \(\mathsf {P} (\mathbf {p}|H_\omega ^t, \theta _i )\) . This new belief can then be used to obtain a parameter estimation, e.g., by finding the maximum of the polynomial or by sampling from the polynomial. For further details, we recommend reading Albrecht and Stone’s work [ 2 ].

POMCP Although in the MDP model agent \(\phi\) has full observation of the environment, it cannot observe the type and parameters of its teammates. Therefore, we can employ POMCP [ 37 ], a state-of-the-art on-line planning algorithm for POMDPs (Partially Observable Markov Decision Process) [ 25 ]. POMCP stores a particle filter at each node of a Monte Carlo Search Tree. In this case, like the environment, apart from the types and parameters of the other agents, is fully observable, the particles are defined as different combinations of the types and parameters for all agents in \(\varvec{\Omega }\) . I.e., [( \(\theta _4, \mathbf {p}_1\) ), ( \(\theta _2, \mathbf {p}_2\) ), ..., ( \(\theta _1, \mathbf {p}_n\) )], where each ( \(\theta , \mathbf {p}\) ) corresponds to one non-learning agent.

In the very first root, when the particles are created, we randomly assign types and parameters for each agent at each particle. Therefore, at every iteration, we sample a particle from the particle filter of the root, and hence change the estimated type and parameters of the agents. As in the POMCP algorithm, the root gets updated once a real action is taken, and a real observation is received. Therefore, for having a type probability \(\mathsf {P}(\theta )_\omega\) for a certain agent \(\omega\) , we calculate the frequency that the type \(\theta\) is asssigned to \(\omega\) in the current root’s particle filter. Additionally, for the parameter estimation, we will consider the average across the particle filter (for each type and agent combination). For further explanations about the POMCP algorithm, we recommend reading Silver and Venesss [ 37 ].

6 On-line estimators for ad-hoc task execution

In this section, we introduce our novel algorithm, On-line Estimators for Ad-hoc Task Execution (OEATE), which helps the ad-hoc agent \(\phi\) to learn the parameters and types of non-learning teammates autonomously. The main idea of the algorithm is to observe each non-learning agent ( \(\omega \in \varvec{\Omega }\) ) and record all tasks ( \(\tau \in \mathbf {T}\) ) that any one of the agents accomplishes, in order to compare them with the predictions of sets of estimators . In OEATE, there are some fundamental concepts applied during the process of estimating parameters and types. Therefore, we introduce the concepts first and, then, explain the algorithm in detail.

6.1 OEATE fundamentals

Sets of Estimators In OEATE, there are sets of estimators \(\mathbf {E}^{\theta }_{\omega }\) for each type \(\theta\) and each agent \(\omega\) that the agent \(\phi\) reasons about (Fig. 1 ). Moreover, each set \(\mathbf {E}^{\theta }_{\omega }\) has a fixed number of N estimators \(e \in \mathbf {E}^{\theta }_{\omega }\) . Therefore, the total number of sets of estimators for all agents are \(|\varvec{\Omega }| \times |\varvec{\Theta }|\) . Figure 1 presents this idea, relating agent, types and estimators .

figure 1

For each \(\omega\) agent there is a set of estimators \(\mathbf {E}^{\theta }_{\omega }\) for each type

An estimator e of \(\mathbf {E}^{\theta }_{\omega }\) is a tuple: \(\{\mathbf {p}_e, {c}_e, {f}_e, {\tau }_e\}\) , where:

\(\mathbf {p}_e\) is the vector of estimated parameters for \(\omega\) , and each element of the parameter vector is defined in the corresponding element range.

\({c}_e\) holds the success score of each estimator e in predicting tasks.

\({f}_e\) holds the failures score of each estimator e in predicting tasks.

\(\varvec{\tau }_e\) is the task that \(\omega\) would try to complete, assuming type \(\theta\) and parameters \(\mathbf {p}_e\) . By having estimated parameters \(\mathbf {p}_e\) and type \(\theta\) , we assume it is easy to predict \(\omega\) ’s target task at any state.

The success and failure scores ( \(c_e\) and \(f_e\) , respectively) will be further explained the in the Evaluation step of OEATE  presentation.

All estimators are initialised in the beginning of the process and evaluated whenever a task is done (by the \(\omega\) agent alone or cooperatively). The estimators that are not being able to make good predictions after some trials are removed and replaced by estimators that are created using successful ones, or purely random, in a fashion inspired by GA [ 24 ].

Bags of successful parameters Given the vector of parameters \(\mathbf {p}_e = <p_1, p_2, \ldots , p_n>\) , if any estimator e succeeds in task prediction, we keep each element of the parameter vector \(\mathbf {p}_e\) in bags of successful parameters to use them in the future during new parameter vector creation. Accordingly, there is a bag of parameters \(\mathbf {B}_\omega ^{\theta }\) for each type \(\theta \in \varvec{\Theta }\) as there is a estimator set \(\mathbf {E}^{\theta }_\omega\) for each type. These bags are not erased between iterations, hence, their size may increase at each iteration. There is no limit size for the bags. We will provide more details in Sect. 6.2 . Figure 2 presents this idea, relating agent, types and estimators to the addition of estimators in the bags.

figure 2

For each \(\omega\) agent and each possible type \(\theta \in \Theta\) , there is a bag of successful estimators. Successful estimator are copied to the bag of estimators of their respective type, in order to later generate new combinations of their elements. The check mark indicates success in predicting the task and the cross mark indicates failure

Choose Target State In the presented task-based ad-hoc teamwork context, besides estimation of type and parameter for each non-learning agent ( \(\omega \in \varvec{\Omega }\) ), the learning agent \(\phi\) must be able to estimate the Choose Target State ( \(\mathfrak {s}_e\) ) of each \(\omega\) . The Choose Target State of an \(\omega\) agent can be any \(s \in S\) or, in other words, a non-learning agent \(\omega\) can choose a new task \(\tau \in T\) to pursue at any time t or state s . This can happen in many situations, for example, when the agent \(\omega\) notices that its target is not existing anymore (if it was completed by other agents), it would choose a new target, and the Choose Target State would not be the same state as when the last task was done by agent \(\omega\) . Hence, a task-based estimation algorithm must be able to identify these moments where a possible task decision happened, to correctly predict the target.

Example For a better understanding of our method’s fundamentals, we will present a simple example. Let us consider a foraging domain [ 2 , 41 ], in which there is a set of agents in a grid-world environment as well as some items. Agents in this domain are supposed to collect items located in the environment.

We show a simple scenario in Fig. 3 , in which there are two non-learning agents \(\omega _1\) , \(\omega _2\) , one learning agent \(\phi\) , and four items which are in two sizes. As in all foraging problems, each task is defined as collecting a particular item, so in this scenario there are four tasks \(\tau ^i\) . In addition, we have two types \(\theta _1\) and \(\theta _2\) , and two parameters ( \(p_1, p_2\) ), where \(p_1, p_2 \in [0, 1]\) .

figure 3

Example of \(\phi\) thinking about \(\omega\) agents’ behaviour, when performing foraging

To keep the example simple, we consider that only \(p_1\) affects \(\omega _1\) ’s decision-making at each state, and its behaviour follows the rules:

If the type is \(\theta _1\) , and \(p_1 \ge 0.5\) , then \(\omega _1\) goes towards small and furthest item ( \(\tau ^3\) ).

If the type is \(\theta _1\) , and \(p_1 < 0.5\) , then \(\omega _1\) goes towards small and closest item ( \(\tau ^1\) ).

If the type is \(\theta _2\) , \(\forall p_1 \in [0, 1]\) , \(\omega _1\) goes towards big and closest item ( \(\tau ^2\) ).

Therefore, in the example scenario, there are four sets of estimators , two for each \(\omega\) agent : \(\mathbf {E}^{\theta _1} _{\omega _1}\) , \(\mathbf {E}^{\theta _2}_{\omega _1}\) , \(\mathbf {E}^{\theta _1} _{\omega _2}\) , \(\mathbf {E}^{\theta _2}_{\omega _2}\) . We assume that the total number of estimators in each set is 5 ( \(N=5\) ). Furthermore, we maintain 4 bags of estimators : \(\mathbf {B_{\omega _1}^{\theta _1}}\) , \(\mathbf {B}^{\theta _2}_{\omega _1}\) , \(\mathbf {B}^{\theta _1} _{\omega _2}\) , \(\mathbf {B}^{\theta _2}_{\omega _2}\) .

We assume that the true type of agent \(\omega _1\) is \(\theta _1\) , and the true parameter vector is (0.2, 0.5). At this point, we will focus on the set of estimators for agent \(\omega _1\) . Moreover, we will continue to use this example to explain further details of OEATE  implementation.

6.2 Process of estimation

After presenting the fundamental elements of OEATE, we will explain how we define the process of estimating the parameters and type for each non-learning agent. Simultaneously, we will also demonstrate how OEATE  evolves in various steps, using our above example. The algorithm is divided into five steps, which is executed for all agents in \(\varvec{\Omega }\) at every iteration:

Initialisation : responsible for initialising the estimator set and the bags of successful estimators for each agent \(\omega \in \varvec{\Omega }\) .

Evaluation : step where OEATE  will increase the failure or the success score of each estimator, for all initialised estimator sets, based on the correct prediction of the \(\omega\) ’s target task. If the estimator successfully predicts the task, it will be added to its respective bag. Otherwise, it will be up for elimination.

Generation : step where our method replaces the estimators removed in the evaluation process for new ones.

Estimation : process of calculating the types’ probabilities and expected parameters’ value for each existing estimators set. The calculation is based on the success rate of each set.

Update : responsible for analysing the integrity of each estimator e and its respective chosen target \(\tau _e\) given the current world state. If it finds some inconsistency, a new prediction is made considering \(\omega\) ’s perspective.

These steps are explained in detail below:

Initialisation At the very first step, for each identified teammate in the environment, we initialise its estimation set and the bag for each possible type. Therefore, agent \(\phi\) needs to create N estimators for each type \(\theta \in \varvec{\Theta }\) and each \(\omega \in \varvec{\Omega }\) . If there is a lack of prior information, the parameter vectors \(\mathbf {p}_e\) of each estimator can be initialised with a random value from the uniform distribution \(\mathcal {U}\) , in each parameter’s range. Since each estimator has a certain type \(\theta\) and a certain parameter vector \(\mathbf {p}_e\) , it allows agent \(\phi\) to estimate agent \(\omega\) ’s task choosing process. A task will be estimated and assigned to \(\tau _e\) when, in a given state \(s \in S\) at the time t , the prediction return a valid task. In the case where there is no valid task to return at the state s and time t , \(\tau _e\) receives “None” and will be updated in later iterations (process carried out by the Update step). Finally, both \({c}_e\) and \({f}_e\) are initialised to zero.

The Algorithm 3 presents the initialisation process.

figure c

Initialisation Example Returning back to our example, in Initialisation step, we start by creating random estimators , as shown in Table 1 . To make the example simple, we define the state as only the position of agent \(\omega _1\) . Therefore, we set each \(\mathfrak {s}_e\) (Choose Target State) with the initial position of \(\omega _1\) , which is (3, 4), and then we create the parameter vectors \(\mathbf {p}_e\) by randomly sampling from the uniform distribution, which should be done separately for both \(p_1\) and \(p_2\) . Agent \(\phi\) simulates \(\omega _1\) ’s task decision-making process for each estimator in the sets \(\mathbf {E}^{\theta _1}_{\omega _1}\) and \(\mathbf {E}^{\theta _2}_{\omega _1}\) , and obtains the corresponding target task \(\tau _e\) based on the type and parameter of each estimator . In addition, all \(f_e\) and \(c_e\) will be initialised as zero. All initial estimators for both sets are shown in Table 1 .

Evaluation The evaluation of all sets of estimators \(\mathbf {E}^{\theta }_\omega\) for a certain agent \(\omega\) starts when it completes a task \(\tau _\omega\) . The objective of this step is to find the estimators that could estimate \(\omega\) ’s just completed real task \(\tau _\omega\) correctly. Therefore, we present the Algorithm 4 to facilitate the understanding and explanation of the evaluation process.

figure d

As there are sets of estimators for each type \(\theta \in {\varvec{\Theta }}\) , then for every e in \(\mathbf {E}^{\theta }_\omega\) , we check if the \(\tau _{e}\) (estimated task by assuming \(\mathbf {p}_e\) to be \(\omega\) ’s parameters with type \(\theta\) ) is equal to \(\tau _{\omega }\) (the real completed task). If they are equal, we consider them as successful parameters and save the \(\mathbf {p}_e\) vector in the respective bag \(\mathbf {B}_\omega ^{\theta }\) (Line 5). The union between bag and parameter, which is applied in the equation, means that new parameters would be added to the bag with repetition, and if a parameter succeeds many times, it will appear in the bag with the same numbers of successes, so the chance of selecting it would be higher.

If the estimated task \(\tau _e\) is equal to the real task \(\tau _{\omega }\) , we will increase the \(c_e\) following \(c_e \leftarrow c_e + score(e)\) . The score ( e ) value denotes the information-level score for the prediction made by estimator e . The information-level score is used to represent the weighting given to certain task completions over others. For example: If a task prediction occurs many steps before the task completion, it was likely made by a correct estimator than by random chance. Furthermore, this function can be tweaked in a domain-specific way.

If the estimated task \(\tau _e\) is not equal to the real task \(\tau _{\omega }\) , we will increase the \(f_e\) score following \(f_e \leftarrow f_e + score(e)\) . Note from the algorithm that we will only remove an estimator e if its success rate is lower than \(\xi\) (Line 10). We define the threshold \(\xi\) as a success threshold aiming to improve our estimator set, by removing the estimators that do not make good predictions and keeping the ones that do (more detail in the Generation explanation).

Note that, by using this approach, any generated estimator e has a chance to be eliminated at the first iteration of estimation. Hence, some estimators, which may potentially approximate well the actual parameters, can be removed after performing their first estimation wrongly, \(\forall \xi \in [0,1]\) . However, even if these particles fail at the beginning of the estimation, other estimators may also likely fail in the subsequent iterations of OEATE, enabling the regeneration of the removed potentially correct estimator through the bags or by sampling it again from the uniform distribution. As we will show in Section 6.3, OEATE estimates the correct parameter for all agents as the number of completed tasks grows and under some assumptions.

Finally, the Choose target State ( \(\mathfrak {s}_e\) ) of the successful estimators is updated and a new task ( \(\tau _{e}\) ) is predicted using the type and parameters of the estimator. The evaluation process ends and the removed estimators will be replaced by new ones in the Generation Process.

Evaluation Example From the previous example, after the initialisation, the agents move towards their respective targets. Based on the true type and parameters of the agent \(\omega _1\) , after some iterations, the agent ( \(\omega _1\) ) gets the item that corresponds to the task \(\tau ^1\) . For this example, and throughout our experimentation, we will use the number of steps required between predicting the task and completing the next task as the score ( information-level ) for the estimator for that prediction. Let us assume that the number of steps required by the agent \(\omega _1\) is 4 (3 for moving and 1 for completing). From Fig. 4 , the agent \(\omega _1\) ’s new position will be (6,4). We will use this value as the score for the estimators. Note that here, since all estimators chose the task at the same time, they will get the same score.

Whenever a task is done by an agent, the process of evaluation will start. Now, we carry out the next step of our process. In Evaluation , all estimators of the two sets \(\mathbf {E}^{\theta _1} _{\omega _1}\) , \(\mathbf {E}^{\theta _2}_{\omega _1}\) will be evaluated. If the task \(\tau\) of any estimator e equals to \(\tau ^1\) , then its success counter \(c_e\) increases by score ( e ), otherwise it remains the same. Also, in failing cases, the counter of failures \(f_e\) increases by score ( e ). The updated values of the estimators are shown in Table 2 .

If we suppose that the threshold for removing estimators is equal to 0.5 ( \(\xi = 0.5\) ), then we will have two surviving estimators ( \(\frac{c_e}{c_e+f_e} \ge \xi\) ) at \(\mathbf {E}^{\theta _1}_{\omega _1}\) and none in \(\mathbf {E}^{\theta _2}_{\omega _1}\) . Hence, the bag for \(\theta _1\) are: \(\mathbf {B}_{\omega _1}^{ {\theta _1}} = \{(0.4,0.6) , (0.2,0.5)\}\) and the bag for \(\theta _2\) is empty. Further, the new Choose Target State will be (6,4) and using this, we can find the new task ( \(\tau _e\) ) for each of the surviving estimators. The new estimator sets are represented in Table 3 and, the new choose target state is illustrated by Fig. 4 .

figure 4

New Choose Target state after \(\omega _1\) completing \(\tau ^1\) . At this step, \(\omega _1\) will try to find a new task to pursue

Generation The generation process of new estimators occurs after every evaluation process and only over the removed estimators. In this step, the objective is to generate new estimators , in order to maintain the size of the \(\mathbf {E}^{\theta }_\omega\) sets equal to N .

Unlike the Initialisation step, we do not only create random parameters for new estimators , but generate a proportion of them using previously successful parameters from the bags \(\mathbf {B}_\omega ^{\theta }\) . Therefore, we will be able to use a new combination of parameters from estimators that had successful predictions at least one time in previous steps. Moreover, as the number of copies of the parameter \(\mathbf {p}\) in the bag \(\mathbf {B}_\omega ^{\theta }\) is equivalent to the number of successes of the same parameter in previous steps, the chance of sampling very successful parameters will increase according to its success rate.

The idea of using successful estimators to generate part of the new estimators is related to the Genetic Algorithm (GA) principles. Until now, the described process shares several similarities with the GA idea, such as the generation of a sample population for further evaluation and feature improvement. Furthermore, we are concerned about boosting our estimation process (based on the estimator sampling and evaluation), so we require a reasonable way to generate new estimators that can improve our estimation quality. Therefore, inspired by GAs mutation and cross-over process, we implement a GA-inspired process that supports our generation method.

Therefore, after the elimination of estimators for which the probability of making a correct prediction is lower than the threshold \(\xi\) , we will generate new estimators for our population following the mutation rate of m , where part of our population is generated randomly following a uniform distribution \(\mathcal {U}\) , and the rest following a process inspired by the cross-over, using our bags of successful parameters. With domain knowledge, different distributions could be used. Figure 5 illustrates how the estimator set changes during this described process and indicates the portion of particles generated using the bags or randomly. Algorithm 5 summarises this generation procedure.

figure 5

Estimation set modifications from the evaluation to the end of generation process. a , b present the modifications after the evaluation and after the generation, respectively. c Presents the entire modification process

figure e

The generation process using the bags can be seen in Algorithm 5 Line 10-13 . There, a new estimator is created by sampling n different parameters (with repetition) from the target bag, and then choosing their \(i\text {-th}\) parameters. Hence, essentially if the parameter of new estimator ( \(e^{new}\) ) is \(\mathbf {p}_{e^{new}} = <p_1, p_2, \ldots , p_n>\) , then \(p_i\) is chosen by sampling \(\mathbf {p}_{sampled} \sim \mathbf {B}_{\omega }^{\theta }\) and then taking the i -th parameter from it ( \(p_{i,{sampled}}\) ).

After performing all the generations with the bag, we continue to fill the estimator set with uniform generated parameters. Once the estimator set is full (i.e., \(|\mathbf {E}_{\omega }^{\theta }| = N\) ), the current state is assigned as Choose Target State ( \(\mathfrak {s}_{e^{new}}\) ) of every new estimator. Afterwards, a task ( \(\tau _{e^{new}}\) ) is predicted for each new estimator and the generation process finishes.

Generation Example Supposing \(m = \frac{1}{3}\) as mutation rate, then \((1 - \frac{1}{3}) \times (5-2) = 2\) new estimators are generated by randomly sampling from the bags, while \(\frac{1}{3} \times (5-2) = 1\) estimator is generated randomly from the uniform distribution. Therefore, we may create new estimators with the following parameters: (0.4, 0.5); (0.2, 0.6); (0.8, 0.7), where the last vector is fully random. For \(\mathbf {E}^{\theta _2}_{\omega _1}\) , as all estimators were removed and the corresponding bags are empty, the whole set \(\mathbf {E}^{\theta _2}_{\omega _1}\) will be generated using the uniform distribution as in the initialisation process. After this, the current state (6,4) , is assigned as the Choose Target State for each new estimator and a task is predicted. All new estimators and updated values are shown in Table 4 .

Estimation At each iteration after doing evaluation and generation , it is required to estimate a parameter and type for each \(\omega \in \varvec{\Omega }\) to improve the decision-making. First, based on the current sets of estimators , we calculate the probability distribution over the possible types. For calculating the probability of agent \(\omega\) having type \(\theta\) , \(\mathsf {P}(\theta )_\omega\) , we use the success score \(c_e\) of all estimators of the corresponding type \(\theta\) . For each \(\omega \in \varvec{\Omega }\) , we add up the success rates \(c_e\) of all estimators in \(\mathbf {E}^\theta _\omega\) of each type \(\theta\) , that is:

It means that we want to find out which set of estimators is the most successful in estimating correctly the tasks that the corresponding non-learning agent completed. In the next step we normalise the calculated \(k^\theta _\omega\) , to convert it to a probability estimation, following:

During the simulations, OEATE  will sample estimations from the current estimation sets. In detail, for each agent \(\omega\) , we will sample a type \(\theta\) based on \(P(\theta )_\omega\) and sample an estimator from \(\omega\) ’s estimator set of that type ( \(\mathbf {E}_\omega ^\theta\) ), using the weights given by \(c_e\) of the estimators. In this way, once a type ( \(\theta\) ) is selected, the probability of selection of each estimator \(e \in \mathbf {E}_\omega ^{\theta }\) is equals to \(c_e/k_\omega ^\theta\) . If \(k_\omega ^{\theta } = 0\) , we sample the estimator uniformly from \(\mathbf {E}_{\omega }^{\theta }\) . Otherwise, we perform the weighted sampling.

Using this strategy, OEATE  can improve the reasoning horizon and diversify the simulations. Differently from AGA and ABU that presents only a single estimation per iteration, we present a set of the (current) best found estimators for planning and decision-making.

Estimation Example Now, we do the Estimation step in our example to have a probability distribution over types, and one parameter vector per type of \(\omega _1\) . At this step, in order to find the probability of being either \(\theta _1\) or \(\theta _2\) , we apply the Equation 6.2 . By considering the \(c_e\) of all estimators , we have that:

Hence, to calculate the probability of each type, we use the Equation 6.2 . Accordingly, the probabilities are:

which means that the probability of being \(\theta _1\) is the higher one.

Now, for the sampling process, we sample a type using the previously calculated distribution. Let’s say that we sample \(\theta _1\) . Now, from this type, we also sample an estimator, using the ratio \(c_e/k^{\theta _1}\) as the probability of each estimator in \(E_{\omega _1}^{\theta _1}\) . Concretely, we get:

while the other estimators have probability 0. So, we use these probabilities to sample an estimator, let’s say (0.4,0.6). Therefore, type \(\theta _1\) and the parameters (0.4, 0.6) will be our estimated type and parameter for the current estimation step.

During the planning phase in the root of the MCTS (for the learning agent \(\phi\) perspective), the OEATE  will sample the simulating type and parameter respecting the probabilities calculated above. Moreover, to calculate the error of the estimation of our method, we use the mean square error (MSE) between the true parameter and the expected parameter of the true type ( \(\theta ^*\) ). The expected parameter of a type ( \(\theta\) ) and agent \(\omega\) is calculated as:

Update As mentioned earlier, there are possible issues that might arise in our estimation process, they occur:

when a certain task \(\tau\) is accomplished by any of the team members (including agent \(\phi\) ), and some other non-learning agent was targeting to achieve it, or;

when a certain non-learning agent is not able to choose a task to target (e.g., cannot see or find any available (or valid) task within its vision area considering possible parameters limitations, such as vision radius and angle).

If some non-learning agent \(\omega\) faces one of these problems, it will keep trying to find a task to pursue. Hence, from the perspective of the learning agent \(\phi\) , OEATE must handle this problem updating its teammates’ targets. Otherwise, it might incorrect evaluate the available estimators given the outdated prediction.

figure f

Therefore, the OEATE’s Update process exists to guarantee the estimator set integrity for future evaluation. At each iteration, the update step will analyse the integrity of each estimator e and its respective chosen target \(\tau _e\) given the current world state. If it finds some inconsistency, it will simulate the estimator’s task selection for the next states, considering \(\omega\) ’s perspective. The process is carried out in each successive state until it returns a new valid target for the indecisive estimator. The Algorithm 6 presents the described update routine.

Update Example In the update step, we look at our estimators from Table 4 and check whether the conditions for update (from Algorithm 6 ) are met. Evidently, for our case,we see that every estimator has a valid task assigned to it and therefore, nothing will happen in the update step.

6.3 Analysis

We show that as the number of tasks goes to infinite, under full observability, OEATE  perfectly identifies the type and parameters of all agents \(\omega\) , given some assumptions. Since each of our updates are related to completing the tasks, this analysis assumes that the agents are able to finish the tasks. First, we consider that parameters have a finite number of decimal places. This is a light assumption, as any real number x can be closely approximated by a number \(x'\) with finite precision, without much impact in a real application (e.g., any computer has a finite precision). Hence, as each element \(p_i\) in the parameter vector is in a fixed range, there is a finite number of possible values for it. To simplify the exposition, we consider \(\psi\) possible values per element (in general they can have different sizes). Let n be the dimension of the parameter space.

Additionally, let \(\mathbf {p^*}\) be the correct parameter, and \(\theta ^*\) be the correct type, of a certain agent \(\omega\) . We define \(\theta ^{-} \not = \theta ^*\) , and \(\mathbf {p}^{-} \not = \mathbf {p^*}\) , representing wrong types and parameters, respectively. We will also use tuples \((\mathbf {p}, \theta )\) to represent a pair of parameter and type.

Assumption 1 Any \((\mathbf {p}, \theta ^{-})\) , and any \((\mathbf {p}^{-}, \theta ^*)\) has a lower probability of making a correct task estimation than \((\mathbf {p^*}, \theta ^*)\) . Moreover, we assume that the correct parameter-type pair \((\mathbf {p}^*,\theta ^*)\) will also be able to have the correct Choose Target State ( \(\mathfrak {s}_e\) ).

This assumption is very light because if a certain pair \((\mathbf {p}, \theta ^{-})\) or \((\mathbf {p}^{-}, \theta ^*)\) has a higher probability of making correct task predictions, then it should indeed be the one used for planning, and could be considered as the correct parameter and type pair.

Assumption 2 Any \((\mathbf {p}, \theta ^{-})\) , and any \((\mathbf {p}^{-}, \theta ^*)\) will not succeed infinitely often. That is, as \(|\mathbf {T}| \rightarrow \infty\) there will be cases where it successfully predicts the task, but the number of cases is limited by a finite constant c .

Assumption 3 This assumption is needed to distinguish our method from a random search. The assumption has 2 parts: (i) a correct value \(p_i^*\) in any position i may still predict the task wrongly (since other vector positions may be wrong), but it will eventually predict at least one task correctly in at most t trials, where t is a constant; (ii) a wrong value \(p_i^{-}\) in any position i may still predict the task correctly (since other vector positions may be correct), but that would happen at most \(\mathfrak {b}\) times for each bag, across all wrong values. Therefore, \(\mathfrak {b} \ll \psi\) .

That is, if one of the vector positions i is correct, \(\mathbf {p}\) will not fail infinitely, even though other elements may be incorrect. That is valid in many applications, as in some cases only one element is enough to make a correct prediction. For example, if a task was nearby, for almost any vision radius it would be predicted as the next one if the vision angle were correct. On the other hand, wrong values will not always succeed. That is also true in many applications: although by the argument above, wrong values may make correct predictions, but these are a limited number of cases in the real world. Eventually, all tasks nearby will be completed, and a correct vision radius estimation becomes more important to make correct predictions. Usually, \(\psi\) would be large (e.g., they may approximate real numbers), so we would have \(\mathfrak {b} \ll \psi\) . Additionally, we will consider the case with lack of previous knowledge, so parameters and types will be initially sampled from the uniform distribution. As before, we denote by \(\mathsf {P}(\theta )\) the estimated probability of a certain agent having type \(\theta\) , but we drop the subscript \(\omega\) for clarity.

OEATE  estimates the correct parameter for all agents as \(|\mathbf {T}| \rightarrow \infty\) . Hence, \(\mathsf {P}(\theta ^{*}) \rightarrow 1\) .

Since wrong parameters-type pairs will not succeed infinitely often, we always will generate new estimators with a random \(\mathbf {p}_e\) . As we sample from the uniform distribution, \(\mathbf {p}^*\) will be sampled with probability \(1/\psi ^n > 0\) . Hence, eventually it will be generated as \(|\mathbf {T}| \rightarrow \infty\) . As the generation defines a Bernoulli experiment, from the geometric distribution, we expect \(\psi ^n\) trials.

Therefore, eventually, there will be an estimator with the correct parameter vector \(\mathbf {p^*}\) . Furthermore, since \((\mathbf {p^*}, \theta ^*)\) has the highest probability of making correct predictions (Assumption 1), it has the lowest probability of reaching the failure threshold \(\xi\) . Hence, as \(|\mathbf {T}| \rightarrow \infty\) , there will be more estimators \((\mathbf {p^*}, \theta ^*)\) , than any other estimator . Further, any \((\mathbf {p^{-}}, \theta ^*)\) will eventually reach the failure threshold, and will eventually be discarded, since it succeeds at most c times by Assumption 2. Therefore, by considering our method of sampling an estimator from the estimator sets, we will correctly estimate \(\mathbf {p^*}\) when assuming type \(\theta ^*\) . Hence, when \(|\mathbf {T}| \rightarrow \infty\) the sampled estimator from \(\mathbf {E}^{\theta ^*}_\omega\) will be \(\mathbf {p^*}\) .

Further, when we consider the Assumption 2 , then the probability of the correct type \(\mathsf {P}(\theta ^*) \rightarrow 1\) . That is, we have that \(c_e \rightarrow \infty\) in the set \(\mathbf {E}^{\theta ^*}_\omega\) . Hence, \(k^{\theta ^*}_\omega \rightarrow \infty\) , while \(c_e < c\) for \(\theta ^{-}\) (by assumption). Therefore:

while \(\mathsf {P}(\theta ^{-}) \rightarrow 0\) , as \(|\mathbf {T}| \rightarrow \infty\) . \(\square\)

This ensures that the as \(|\mathbf {T}| \rightarrow \infty\) , the sampled type is \(\theta ^*\) .

We saw in Theorem 1 that a random search from the mutation proportion takes \(\psi ^n\) trials in expectation. OEATE , however, finds \(\mathbf {p}^*\) much quicker than that, since a proportion of estimators are sampled from the corresponding bags \(\mathbf {B}_\omega ^{\theta , i}\) . In the following proposition, we will prove that OEATE  will indeed find \(\mathbf {p}^*\) and under Assumption 1, \(\mathbf {p}^*\) would have highest probability of not being removed from the estimator set and will continue to add it’s own parameters back to the bag, thereby further increasing the probability of sampling those parameters at each mutation.

Proposition 1

OEATE  finds \(\mathbf {p}^*\) in \(O(n \times \psi \times (\mathfrak {b}+1)^n)\) .

Consider Assumption 3, we know that at some time, we must encounter a parameter value \(p_i\) . Sampling the correct value for element \(p_i\) would take \(\psi\) trials in expectation. Once a correct value is sampled, it will be added to \(\mathbf {B}_{\omega }^{\theta ^*}\) if it makes at least one correct task prediction. It may still make incorrect predictions because of wrong values in other elements, and it would be removed (from the estimator set) if it reaches the failure threshold \(\xi\) . However, for a constant number of trials \(t \times \psi\) , it would be added to \(\mathbf {B}_{\omega }^{\theta ^*}\) . Similarly, sampling the correct value for all n dimensions at least one time would take \(n \times \psi\) trials in expectation, and in at most \(t \times n \times \psi\) trials \(\mathbf {B}_{\omega }^{\theta ^*}\) would have at least one estimator each with the correct value in position i . The bags store repeated values, but in the worst case, there is only one correct example at each \(\mathbf {B}_{\omega }^{\theta ^{*}}\) , leading to at least \(1/(\mathfrak {b}+1)\) probability to sample the correct value from the bag. Hence, given the bag sampling operation, we would find \(\mathbf {p}^*\) with at most \(t \times n \times \psi \times (\mathfrak {b}+1)^n\) trials in expectation.

Hence, the complexity is close to \(O(\psi )\) , instead of \(O(\psi ^n)\) as the random search (since \(\mathfrak {b} \ll \psi\) ).

Considerations In Assumption 1, the choose target state \((\mathfrak {s}_e)\) of an estimator is dependent only on the previous predicted tasks and the main agent’s observation. Therefore, in a fully observable case, the true parameters have the highest probability of having the correct choose target state . Furthermore, we leave the proof for partially observable cases as future work.

Time Complexity It is worth noting that the actual time taken by the algorithm is dependent on \((\mathfrak {b}+1)^n\) . So, as an example, if \(\mathfrak {b}=10 \ll \psi =100\) , then if \(n=3\) , \((\mathfrak {b}+1)^n =1000 \gg \psi = 100\) . However, when we are write the time complexity, we are focusing on how the algorithm will scale with larger search space (i.e. Higher \(\psi\) ). Further, since \(\psi\) is the precision of parameters, it is likely to be a large value. For instance, if there are 3 elements in parameter vector ( \(\mathbf {p}\) ), if range of each element ( \(p_i\) ) is [0,1] and we want our answer to be accurate up to only 3 places of decimal, then \(\psi = 10^3\) .

6.4 OEATE with partial observability

Assuming full visibility for the learning agent is a strong presupposition and it rarely occurs in a real application (due to data or technology limitations). Thus, towards a more realistic application, we considered scenarios where agent \(\phi\) is working with limited visibility of the environment. Therefore, we formalise our problem as a Partially Observable Markov Decision Process , and similarly as before, we define a single agent POMDP model, which will allow us to adapt POMCP [ 37 ] with our On-line Estimators for Ad-hoc Task Execution .

In this section, we will outline the main changes compared to our previous MDP model (Sect. 2 ) and how we designed our POMCP-based solution for distributed task execution problems into an ad-hoc teamwork context.

6.4.1 POMDP model

Our POMDP model considers one agent \(\phi\) acting in the same environment as a set of non-learning agents ( \(\omega \in \varvec{\Omega }\) ), and \(\phi\) tries to maximise the team performance without any initial knowledge about \(\omega\) agents’ types and parameters. We consider the same set of states \(\mathcal {S}\) , action \(\mathcal {A}\) , transition \(\mathcal {T}\) and reward function \(\mathcal {R}\) defined previously. Additionally, agent \(\phi\) ’s objective is still to maximise the expected sum of discounted rewards. However, now agent \(\phi\) has a set of observations \(\mathcal {O}\) that defines its current state. Every action a produces an observation \(o \in \mathcal {O}\) , which is the visible environment in agent \(\phi\) ’s point of view (all of the environment within the visibility region , in the state \(s'\) reached after taking action a ). We assume agent \(\phi\) can perfectly observe the environment within the visibility region , but it cannot observe anything outside the visibility region . Hence, our POMDP model workswithin a observation function which is deterministic instead of stochastic—so, all values denote empty square , agent or task . As before, agents true types and parameters are not observable.

The current state cannot be observed directly by agent \(\phi\) , so it builds a history \(\mathcal {H}\) instead. \(\mathcal {H}\) consists of a set of collected information \(h_t\) from the initial timestamp \(t=0\) until the current time. Each \(h_t\) is an action and observation pair ao , representing the action a taken at time t , and the corresponding observation o that was received. The current agent history will define its belief state , which is a probability distribution across all possible states. Therefore, agent \(\phi\) must find the optimal action, for each belief state .

This formalisation enables the extension of our planning model, from a full observable context using MCTS to a partially observable context for POMCP application. This transition to a POMCP application is a straightforward process, however, we make further modifications to guarantee the on-line estimation and planning features, which OEATE  presents.

6.4.2 POMCP modification

POMCP [ 37 ] is an extension of UCT for problems with partial observability. The algorithm uses an unweighted particle filter to approximate the belief state at each node in the UCT tree and requires a simulator , which is able to sample a state \(s'\) , reward r and observation o , given a state and action pair. Each time we traverse the tree, a state is sampled from the particle filter of the root. Given an action a , the simulator samples the next state \(s'\) and the observation o . The pair ao defines the next node n in the search tree, and for the current iteration, the state of the node will be assumed to be \(s'\) . This sampled state \(s'\) is added to node n ’s particle filter, and the process repeats recursively down the tree. We refer the reader to Silver and Veness [ 37 ] for a detailed explanation.

However, as in the UCT case, we do not know the true transition and reward functions, since they depend on the pdfs of the non-learning agents ( \(\omega \in \varvec{\Omega }\) ). Therefore, we employ the same strategy as previously: at each time we go through the search tree, we sample a type for each agent from the estimated type probabilities and use the parameters that correspond to the sampled type. These remain fixed for the whole traversal until we re-visit the root node for the next iteration. Note that these sampled types and parameters are also going to be used in the POMCP simulator , when we sample a next state, a reward and an observation after choosing an action in a certain node.

As mentioned previously, POMCP has been modified before to sample transition functions [ 20 ]. Here, however, we are employing a technique that is commonly used in UCT (for MDPs) in ad-hoc teamwork [ 2 , 7 ], but now in a partially observable scenario, which allows us to work on the type/parameter space instead of directly on the complex transition function space. In this way, we can then employ OEATE  for the type and parameter estimation.

The same OEATE  algorithm described in Sect. 6 can handle the cases where any agent \(\omega \in \varvec{\Omega }\) is outside the agent \(\phi\) ’s visibility region. In order to do so, it samples a particle from the POMCP root, which corresponds to sampling a state from the belief state . That allows us to have complete (estimated) states when predicting tasks for \(\omega\) agents. States that are considered more likely will be sampled with a higher probability for the OEATE algorithm following the POMCP belief state filtering probabilities. However, we assume in our implementation (and in all algorithms we compare against) that agent \(\phi\) knows when an agent \(\omega\) has completed a task \(\tau\) , even if it is outside our visibility region. That is, agent \(\phi\) would know exactly which task was completed by a certain agent. That would require in a real application some global signal of task completion (e.g., boxes with radio transmitters).

7.1 Level-based foraging domain

The level-based foraging domain is a common problem for evaluating ad-hoc teamwork [ 2 , 4 , 41 ]. In this domain, a set of agents collaborate to collect items displaced in a rectangular grid-world environment in a minimum amount of time (Fig. 6 ). In this foraging domain, items have a certain weight, and agents have a certain skill level, which defines how much weight they can carry. Hence, agents may need to collaborate to pick up a particularly heavy item. Further, we assume that tasks are spawning in the environment during the execution.

Differently from [ 2 , 41 ], this approach enables a continuous level of information in the scenario, which \(\phi\) must analyse and reason about to improve the team’s performance. The performance here will regard the number of completed tasks in the environment instead of the necessary time to complete all tasks. Concretely, we define the number of tasks that can be in the environment simultaneously. If some agent (or group of agents) accomplishes a task, we spawn a new one for each completion at that execution time. In this way, we manage to maintain a fixed number of tasks in the environment, hence the same problem level from the beginning to the end.

Finally, we defined this problem over full and partial observability, which Fig. 6 illustrates possible scenarios configuration.

figure 6

Possible problem scenarios in the defined level-based foraging domain

Agent’s Parameters Each agent has a visibility region and can only choose items as a target if they are in its visibility cone. Therefore, to know which items are in the visibility area of each agent, we need to have the View Angle and the maximum View Radius of the agents. Additionally, each agent has a Skill Level which defines its ability to collect items. Also, each item has a certain weight, so each agent can collect items that have a weight below their Skill Level or equal to it. Based on what we described above, each agent can be defined by three parameters:

l , which specifies the Skill Level and \(l \in [0.5,1]\) ;

a , which is referring to View Angle . The actual angle of the visibility cone is given by the formula \(a * 2 \pi\) . Additionally, it is assumed that \(a \in [0.5, 1]\) ;

r , which is referring to the View Radius of the agent. The actual View Radius is given by \(r \sqrt{w^2+h^2}\) , where w and h are the width and height of the grid. Also, the range of the radius is \(r \in [0.5, 1]\) .

All of these parameters are applicable to all \(\omega \in \varvec{\Omega }\) . Agent \(\phi\) has the parameter Skill Level when it has either full or partial observability, but the View Angle and View Radius parameters are only applicable when it has partial observability.

Agent’s Types Concerning types of non-learning agents, we took inspiration from Albrecht and Stone [ 2 ] type definitions in the foraging domain. They considered four possible types for the agents in \(\varvec{\Omega }\) : two “leader” types, which choose items in the environment to move towards, and two “follower” types, which attempt to go towards the same items as other agents, in order to help them load items. However, “follower” agents may also choose other agents as target, while in our work we handle agents that choose tasks as target. Therefore, we only consider “leader” agents in our work. Hence, based on agent \(\omega\) ’s type and parameter values, a target item will be selected, and the agent’s internal state (memory) will be set to the position of that target. Afterwards, the agent will move towards the target using the \(A^*\) algorithm [ 21 ]. Here is the detail for how the different types choose their targets:

L 1: if there are items visible, return the furthest item; else, return \(\varnothing\) .

L 2: if there are items visible, return the item with highest sum of coordinates; else, \(\varnothing\) .

L 3: if there are items visible, return the closest item; else, return \(\varnothing\) .

L 4: if there are items visible, return the item with lowest sum of coordinates; else, \(\varnothing\) .

L 5: if there are items visible, return the first item found (considering the orientation: west to east, north to south); else, \(\varnothing\) .

L 6: if there are items visible, return an random item; else \(\varnothing\) .

Actions Each agent has five possible actions in the grid: North , South , East , West , Load .

The first four actions will move the agent towards the selected direction if the destination cell is empty and it is inside the grid.

The fifth action, Load , helps the agent to load its target item. The only time that an agent can collect an item is when the item is next to the agent, and the agent is facing it. Also, for loading the item, the Skill Level of the agent should be equal to or higher than the items’ weight. If the agent does not have enough Skill Level to collect the item, then a group of agents can do the job if the sum of the Skill Levels of the agents that surround the target is greater than or equal the item’s weight. Therefore, the item can be “loaded” by a set of agents or just one agent. In the situation when the agent does not have enough ability to collect the target item, it will standstill in the same place when issuing the Load action. In the case of collecting an item, the team of agents receives a reward and it will be removed from the grid.

Foraging Process : First of all, we describe the process of foraging and choosing a target for agents \(\omega\) in Algorithm 7 in order to facilitate the understanding and explanation.

In the very first step as agent \(\omega\) has not chosen any target, the Mem , which holds the target item, is initialised to \(\emptyset\) . In Line 10, the VisibleItems routine is called, which gets the agent \(\omega\) ’s parameters, View Angle and View Radius , and returns a set containing the visible items. In Line 11, the ChooseTarget routine gets the Skill Level and Type of the \(\omega\) agent, and the list of visible items, returned from VisibleItems routine as input. The output of this routine is the target item that agent \(\omega\) should go towards.

As it is shown in Line 17, there might be cases where agent \(\omega\) is not able to find any target task. In these cases, all actions would get equal probabilities and consequently, it will perform actions uniformly randomly until it is able to choose a task.

We should mention that this is an algorithm template that we assume non-learning agents are following. We use the same template in our simulations, but in practice agents, \(\omega\) could follow different algorithms. Hence, in the results section, we will also evaluate the case where the agents do not follow the same algorithm as in our template.

7.2 Capture the prey domain

Intending to evaluate the present range of applicability of our proposal over different domains, we further perform experiments in the Capture-the-Prey domain.

This domain is presented as a discrete, rectangular grid-world as in Sect. 7.1 . It is a variant of the Pursuit Domain described in [ 9 , 10 ]. There are several “preys” in the environment, which represents the objectives that the Ad-hoc Team must pursue, similar to the “tasks” from our Level-based Foraging environment. However, the preys are also non-learning agents, which are running a reactive algorithm and trying to escape from being captured—defining decentralised tasks, which are moving in the scenario. Each prey can also be identified by a numeric index given to it. The ad-hoc team is composed of non-learning agents \(\omega \in \varvec{\Omega }\) and a learning agent \(\phi\) . They must surround the prey and capture it, which means to block the movement of the prey on all discrete four sides: North, South, East and West. It can be done only by agents, or with the support of walls and/or by other preys. Note that surrounding is mandatory, hence the agents must collaborate in the most efficient way in order to improve their performance. The tasks are re-spawning in this environment as well. Figure 7 illustrates the problem.

figure g

Possible problem scenario in the defined capture the prey domain. The agent with red details is the learning agent \(\phi\) . The agents with blue details are the non-learning agents \(\omega \in \Omega\) . The grey agents are the prey (Color figure online)

Agent’s Parameters The parameter of each agent is the same as earlier, but there is no longer a Skill Level parameter since the completion condition is not related to task parameters. In this way, the \(\omega\) agents still have a visibility region to see and choose targets, which follows:

Agent’s Types Concerning types of non-learning agents, we created 2 main types to run the experiments:

C 1: the spatial type of the set, which chooses the furthest visible prey to pursue, if there are visible preys in its vision area; else, return \(\varnothing\) .

C 2: the index-based type, which chooses preys with an even identification, if there are visible preys in its vision area; else, return \(\varnothing\) .

Actions Each agent has 5 actions : North , South , East , West , Block . The first four actions will move the agent towards the selected direction if the destination cell is empty or it is inside the grid. The Block is the action that actually captures the prey, where the agent stands at its position blocking the passage of prey. Notice that there is no Load action, as the completion of the task ( or “Capturing the Prey” ) is done by the surrounding. So the agent must block one passage of the prey, trying to create a capture situation.

Unlike in level-foraging, the tasks are no longer stationary. At each step, the tasks also move randomly to one of the empty squares next to them. If no such free space exists and, at least, one agent is surrounding the task, it gets captured. So, each task can be completed by at least 1 agent, depending on the location of the task and the whole state configuration (such as other agents positions, other preys positions and map borders).

Capturing Process Directly, the process of choosing actions and targets remains similar to the process defined for Foraging by Algorithm 7.

7.3 OEATE results

Baselines We will compare our novel algorithm (OEATE) against two state-of-the-art parameter estimation approaches in ad-hoc teamwork: AGA and ABU [ 2 ] (both presented in Sect. 5 ). As we mentioned before, we sample sets of parameters (for a gradient ascent step or a Bayesian estimation), which is similar to set of estimators in the OEATE  for estimating parameters and types. Therefore, for better comparison against them, we use the same set size as estimator sets ( N ). Note that Albrecht and Stone [ 2 ] also introduced an approach called Exact Global Optimisation (EGO). We do not include it in our experiments since it is significantly slower than the ABU/AGA, without outperforming them in terms of prediction.

Additionally, we compare our approach against the proposed POMCP-based method (also presented in Sect. 5 ) for type and parameter estimations. As described, in estimation with POMCP, we assume that the agent \(\phi\) can see the whole environment, however, the teammates’ type and parameters are not observable. Hence, agent \(\phi\) applies POMCP’s particle filter for estimation. We use \(N \times |\varvec{\Omega }| \times |\varvec{\Theta }|\) particles, matching the total number of estimators in our approach (since we have N per agent, for each type).

Experiments configuration We executed random scenarios in Level-based Foraging and Capture the Prey domains (Sects. 7.1 and 7.2 , respectively) for a different number of distributed tasks, agents and environment size for all aforementioned estimation methods. The experiment finishes by reaching 200 iterations. Every run was repeated 20 times, and we plot the average results and the confidence interval ( \(\rho = 0.05\) ). Therefore, when we say that a result is significant, we mean statistically significant considering \(\varvec{\rho \le 0.05}\) , according to the result of a Kruskal-Wallis test . In detail, as a first test, we applied the Kruskal-Wallis to determine whether a statistically significant difference exists between all the algorithms considered; afterwards, we evaluated each pair of algorithms using a Wilcoxon Rank Sum Test (with Bonferroni correction) to determine which ones were different from the others. Following these steps, we could accurately calculate the confidence interval in the results obtained by each approach, thus finding which one is significantly better than the others. We avoid presenting every p-value to improve the readability of the work. So, we maintain our focus on presenting the p-values that are meaningful for our analysis and avoid reporting the p-value for results where there is clearly no significance (i.e., \(\varvec{\rho } \ge 0.05\) ). Note that error bars and coloured regions indicate the confidence interval at a \(95\%\) confidence level, not the standard deviation, supporting the confidence visualisation.

For each scenario, we assume one of the four estimation methods ABU, AGA, POMCP and OEATE  to be agent \(\phi\) ’s estimation method. We kept a history of estimated parameters and types for all iterations of each run and calculated the errors by having true parameters and true types in hand. Then, we evaluate the mean absolute error (as in Equation 6.2 ) for the parameters, and \(1 -\mathsf {P}(\theta ^*)\) for type; and what we show in the plots is the average error across all parameters. Additionally, since we are aggregating several results, we calculate and plot the average error across all iterations.

In this way, we first fix the number of possible types as 2 ( L 1, L 2 and C 1, C 2 for Level-based Foraging and Capture the Prey domains, respectively), and later we show the impact of increasing the number of types. Type and parameters of agents in \(\varvec{\Omega }\) are chosen uniformly randomly. At the Level-based Foraging environment, the skill level for agent \(\phi\) is also randomly selected. Every parameter \(p_i \in \mathbf {p}\) is a value within the minimum-maximum interval \([p_i^{min}, p_i^{max}] = [0.5,1.0]\) .

Every task is created in random positions, but we exclude the scenario’s borders and free the adjacent tiles. That allows agents to set up their positions to perform the load action from any direction (i.e., North, South, East, West), making it always possible for 4 to simultaneously load an item, which guarantees that all tasks are solvable. For the Capture the Prey environment, this guarantee is not secured since the tasks are moving.

Estimation methods configuration In our experiments, we used the following configuration for parameters values of OEATE:

the number of estimators N equals to 100;

the threshold for removing estimators \(\xi\) equals to 0.5, and;

mutation rate m equals to 0.2.

"information-level" score ( score ( e )) is taken as the number of steps between assigning the Choose Target state and completing the task.

We apply the same configuration for all baselines (AGA, ABU and POMCP) and through every experiment performed. For UCT-H [ 41 ], we run 100 iterations per time step, and the maximum depth is kept as 100.

Level-based Foraging Results Before showing the aggregated results, we will first show an example of the parameter and type error estimation across successive iterations. Consider the experiment with \(|\varvec{\Omega }| = 7\) , a scenario with dimension equals to \(30 \times 30\) and 30 tasks distributed in the environment. Figure 8 shows this result.

As we can see in Fig. 8 a, our parameter estimation error is consistently significantly lower than the other algorithms from the second iteration, and it (almost) monotonically decreases as the number of iterations increases. AGA, ABU, and POMCP, on the other hand, do not show any sign of converging to a low error as the number of iterations increases. We can also see that our type estimation quickly overcomes the other algorithms in the mean, becoming significantly better after some iterations, as more and more tasks are completed.

figure 8

Parameter and type estimation errors for \(|\varvec{\Omega }| = 7\) , dimension \(30 \times 30\) and \(|\mathbf {T}| = 30\)

- Multiple numbers of items: We now show the results for different numbers of items. Therefore, we fixed the scenario size as \(30 \times 30\) and the number of agents \(\omega\) to 7 ( \(|\varvec{\Omega }| = 7\) ). Then, we run experiments for a varying number of items in the environment (20, 40, 60, 80). Figure 9 shows the result plots.

As we can see in the figure, OEATE  has consistently lower error than the other algorithms in terms of parameters estimation. Considering the type estimation, OEATE  presents significantly better results for 20, 40 and 80 tasks. We also see that the number of accomplished tasks is very similar, which means that there is no significant difference between the results.

It is interesting to see that our parameter error drops for a very large number of items (80), as OEATE  gets a larger number of observations. We can also note that the algorithm scales well to the number of items, and our performance actually improves in the mean with more than 20 items. This happens because OEATE  gets observations more often for a larger number of items in the environment.

- Multiple numbers of agents: After comparing with multiple numbers of items, we run experiments for different numbers of agents. Here, we fix the number of items to 30 and the scenario size to \(30 \times 30\) . Then, we run experiments for a different number of agents (5, 7, 10, 12, 15) and the plots are shown in Fig. 10 .

Again, the figure shows that, for different numbers of agents, OEATE  can present a lower or similar error than the other algorithms, both in parameter and type estimation. Moreover, we can see that the performance of the team by having a learning agent \(\phi\) (which runs OEATE) is also better than others with the increasing number of teammates. Regarding parameters and type errors, OEATE  is significantly better than AGA, ABU and POMCP in almost all cases, except for type error with 15 agents where OEATE  is very similar to AGA, respectively. Interestingly, we can see in this case that, even being slightly worse than AGA, OEATE  can improve the coordination and complete a higher number of tasks than the baselines. Additionally, the experiment with 15 agents presents the higher difference between the estimation methods performance, where OEATE  is clearly the best one.

- Multiple scenario sizes: After comparing multiple numbers of items and agents, we run experiments for different scenario sizes to study our scalability to harder problems. For that, we fix the number of items to 30 and the number of \(\omega\) agents to 7 ( \(|\varvec{\Omega }| = 7\) ). Then, we run experiments for a varying scenario size ( \(20 \times 20\) , \(25 \times 25\) , \(30 \times 30\) , \(35 \times 35\) , \(45 \times 45\) ) and the plots are shown in Fig. 11 .

As we can see, OEATE  has consistently lower error than the other algorithms, both in terms of parameters and type estimation. In fact, OEATE  is significantly better than AGA, ABU and POMCP in terms of type and parameters error for all scenario sizes, with \(\rho < 0.001\) . Additionally, in Fig. 11 (c) we see that there is no significant difference between task completion of the methods. Overall, OEATE  seems to maintain good estimation even with the increasing of scenario dimension.

Partial observability experiment Here, agent \(\phi\) has partial observability of the environment and employs the POMCP modification for handling that, as described in Sect. 6.4.2 . In these experiments, the number of \(\omega\) agents is 7 and the environment size is \(30 \times 30\) , but the variation of items is 20, 40, 60, 80. The radius of \(\phi\) ’s view is 15 and the angle is 180°.

Note that AGA/ABU results for partial observability are not shown in Albrecht and Stone [ 2 ], and thus are presented by us for the first time. Hence, in the cases presented here, by OEATE, AGA and ABU, we mean the modified POMCP version, following the approach described in Sect. 6.4.2 ; and by POMCP we mean the POMCP-based estimation , as before, which does not embed the ad-hoc teamwork algorithms for type and parameter estimation.

We show our results for partially observable scenarios in Fig. 12 . Again, we obtain significantly lower parameter error than previous approaches (Fig. 12 a). In the case of type error (Fig. 12 b), OEATE  presents worst type estimation than the competitors, except POMCP. However, they are not significantly better than OEATE. For 20 items, AGA and ABU present \(\rho > 0.2\) . For 40 and 60, AGA and ABU present \(\rho > 0.09\) . Finally, for 80 items AGA and ABU present \(\rho > 0.35\) . In Fig. 12 c, we see that we obtain similar performance to the previous approaches in 40 and 60 items.

figure 9

Results for a varying number of tasks with full observability

figure 10

Results for a varying number of agents with full observability

figure 11

Results for various environment sizes in full observability

figure 12

Results for a varying number of items in problems with partial observability

OEATE  represents a task-based solution that depends on the prediction of tasks for unknown teammates for any possible state of the problem. The difficulty in estimating types over partial observability is a result of the lack of precision on reasoning about the part of the map that is not visible. Our proposed modification for POMCP could enable the estimation of parameters and types over partial observability. However, as the problem presents a high level of uncertainty, the belief states need not approximate the actual states of the world, hence OEATE  couldn’t perform a good evaluation of its estimators and improve the prediction. Therefore, finding a manner of refining the POMCP belief state approximation can adapt OEATE  to handle this new layer of uncertainty that can improve the results as we found for full observability.

- Experiments with larger numbers of types: Besides trying to estimate two types ( L 1 and L 2), we also want to push the uncertainty level of the problem running experiments for a larger number of potential types ( \(|\varvec{\Theta }|\) ). In this way, we run experiments with four types: L 1,  L 2,  L 3 and L 4. Figure 13 shows the results.

Results displayed in Fig. 13 a demonstrates parameters error, where we are significantly better than all other methods for all number of items with \(\rho \le 0.011\) . From Fig. 13 b, OEATE  outperforms AGA and ABU only with 20 and 60 items in the environment. AGA and ABU are better than OEATE  for 40 and 80 items respectively. In the performance, as we can see in Fig. 13 c, there is no significant difference between the methods.

After studying the four different types case for the \(\omega\) agents, we experiment with six potential types ( L 1,  L 2,  L 3,  L 4,  L 5,  L 6). The results are shown in Fig. 14 .

Considering parameters error, OEATE  is significantly better than the other approaches with \(\rho \le 0.0005\) . Taking type error into account, we are better in all number of items with \(\rho \le 0.06\) , except for 40 items, where we are significantly better than POMCP, but against AGA and ABU, we are worst with \(\rho \le 0.92\) and \(\rho \le 0.34\) , respectively. For performance, OEATE  decreases monotonically as the number of tasks increases.

Overall, OEATE  presents a better result performing estimation with fewer types. Its parameter estimation is significantly better for all studied cases. However, when it is facing a higher number of possible templates for types, its type estimation quality decreases and its performance is still similar in comparison with the competitors.

- Wrong types: We also study our method’s behaviour when the agent \(\phi\) does not have full knowledge of the possible types of its teammates. That is, we run experiments where all agents in \(\varvec{\Omega }\) have a type which is not in \(\varvec{\Theta }\) . In these experiments, we assume that agent \(\phi\) is only aware of type L 1 and L 2, but we assign L 3 and L 4 to the \(\omega\) agents as their type (sampled uniformly randomly). We ran experiments with 7 agents and fixed the size of the scenario to \(30 \times 30\) , with various numbers of items (20, 40, 60, 80). We can see our results for the performance of the team in Fig. 15 .

As the figure illustrates, without knowing the possible types that the teammates might have, OEATE  only outperform the competitors with 80 items, except POMCP. Surprisingly, POMCP shows the better performance in the group. We believe that, without the knowledge of the possible types and considering the difficulty associated with the problem, acting greedily can show better results in such cases.

figure 13

Results for a varying number of items, with randomly selected types among 4 types

figure 14

Results for a varying number of items, with randomly selected types among 6 types

Capture the Prey Result As mentioned before, we run experiments into the Capture the Prey domain. Considering the same settings defined for Level-based Foraging, we define the experiment with \(|\varvec{\Omega }| = 7\) , a scenario with dimension equals to \(30 \times 30\) and the set of tasks distributed in the environment (20,40,60,80) as the main result from the set of experiments. Figure 16 shows these data plot.

As we can see, OEATE  still presenting a significantly lower parameter error in comparison with the competitors. Even though showing worse results compared to AGA and ABU in type estimation, OEATE  seems to be able to decrease its error with the increasing number of tasks, while AGA and ABU seem to converge after considering 60 tasks (preys) in the scenario. Additionally, the performance of all methods is very similar in the capture environment.

The defined Capture the Prey domain defines a hard problem to tackle. Improving the team’s performance relates to choosing actions that will facilitate the preys capture. We believe that OEATE  can present better results against AGA and ABU over an adaptation of the POMCP for adversarial contexts, where OEATE  will be able to reason about the preys, and hence increase the number of tasks accomplished and the type estimation (based on this characteristic).

Overall result Intending to directly present the conclusions found after performing the complete set of defined experiments and also provide support for further analysis of this work, we present in Table 5 the compiled results of this section regarding the experiments performed for the Level-based Foraging and Capture the Prey Environments.

Ablation Study As an interesting piece for the readers, we carried out an ablation study. The intention of this experiment is to show how our internal method choices impact the method outcome. We defined 4 different configurations for the OEATE  considering their impact on the quality of the estimation:

OEATE : representing the full version of our proposal;

OEATE  (No Score) : representing the version that doesn’t apply the score approach of our final proposal, removing the weighting of decisions made in different choose target states and facing different levels of uncertainty;

OEATE  (Uniform Scored) : representing the version that doesn’t perform the process of generating new estimators from the bag. Hence, we removed the bag from our proposal and adapt it to work only with the uniform replacement of estimators, and;

OEATE  (Uniform) : representing the version that doesn’t apply the score approach of our final proposal and doesn’t perform the bag generation process, categorising the simplest version of our proposal.

Additionally, considers the experiment with \(|\varvec{\Omega }| = 7\) , a scenario with dimension equals to \(30 \times 30\) and 30 tasks distributed in a Level-based Foraging environment (2 types were used in this experiment). Figure 17 shows this result.

Regarding the parameter estimation, as the figure shows, we can see that OEATE  performs the estimation similarly for all configurations, but the main impact is regarding the starting point of the estimation method. Using each defined strategy leads OEATE, after few iterations performing the estimation process, to correct the parameter values. Differently, from the process carried out by simpler versions of our proposal, OEATE  showed to be capable of fixing its estimation in this ablation study. We attribute this improvement to the weighting of estimators during the sampling due to the scoring and bag approach.

figure 15

Performance of the ad-hoc team for a varying number of items without having information of correct potential teammates types

figure 16

Parameter errors, type estimation errors, and performance for a varying number of items in the Capture the Prey domain

On the other hand, the improvement in the results related to the type estimation is even higher. The full version of OEATE  presents a significantly better result in comparison with the simpler versions. Interestingly, the second better result found in this ablation study comes from the simplest OEATE  configuration. Both, the scored and the uniform scored versions presents higher type error than the uniform one. At this point, we attribute the improvement to the fact that scores of the estimators help in improving the sampling and maintenance of good estimators in the estimation set. Without recovering estimators from the bag, the scoring can only lead to the trivial game of guessing the correct parameter (hence the type) randomly. Therefore, OEATE  represents a fine solution, which combines two unsuccessful tools to obtain a powerful estimation capability.

figure 17

Ablation Study results for parameter and type estimation errors considering \(|\varvec{\Omega }| = 7\) , dimension \(30 \times 30\) and \(|\mathbf {T}| = 30\) in the Level-based Foraging domain

8 Discussion

We showed in this work that by focusing on distributed task completion problems, where agents autonomously decide which task to perform, we can obtain better type and parameter estimations in ad-hoc teamwork than previous works in the literature. Although not all problems can be modelled as a set of tasks to be completed, it does encompass a great range of useful problems. For instance, apart from the obvious warehouse management and the proposed capture the prey game, we could think about situations such as rescuing victims after a natural disaster or even during some hazard and demining.

Note that different teammates do not need to share the same representation of the problem, and run algorithms that explicitly “choose” tasks. That is, they could have been programmed with different paradigms, without using any explicit task representation. However, their external behaviour would still need to be understood as solving tasks distributed in an environment in the point of view of our ad-hoc agent. Hence, we do need problems and teammates that fit the decentralised task allocation representation for our agent, but the actual teammates’ internal models could be different.Further, our use of the threshold \(\zeta\) gives us finer control over the evolutionary process of the estimators, by ensuring that only estimators with a minimum level of quality can survive.

Another interesting characteristic of our algorithm is that it allows learning from scratch at every run in an on-line manner, following the inspiration from Albrecht and Stone [ 2 ]. Therefore, we can quickly adapt to different teams and different situations, without requiring significant pre-training. Neural network-based models, on the other hand, would require thousands (even millions) of observations, and although they may show some generalisability, eventually re-training may be required as the test situation becomes significantly different than the training cases.

It is true that our algorithm requires a set of potential types to be given. In the case where this set cannot be created from domain knowledge, then some training may be required to initialise this set. Afterwards, however, we would be able to learn on-line at every run, without carrying further knowledge between executions. Albrecht and Stone [ 2 ] also follow the same paradigm, and directly assumes a set of potential parametrisable types, without showing exactly how they could be learned. There are several examples of learning types in ad-hoc teamwork, but they still ignore the possibility of parametrisation. For instance, PLASTIC-Model [ 8 ] employs a supervised learning approach, and learns a probability distribution over actions given a state representation using C4.5 decision trees.

In order to better understand the impact of this assumption, we also run experiments where the set of types considered by the ad-hoc agent does not include the real types of teammates. In these challenging situations, we find that our performance is either similar to the other works in the literature depending on each case.

We have also shown that our algorithm scales well to a range of different variables, as we increase the number of items, number of agents, scenario sizes, and number of types. Usually, models based on neural networks (e.g., [ 22 , 34 ]) are not yet able to show such scalability and present only restricted cases. A similar issue happens with I-POMDP based models (e.g., [ 12 , 17 , 19 , 23 ]) which tend to show experiments in simplified scenarios due to the computational constraints. Therefore, by focusing on distributed task execution scenarios, we are able to propose a light-weight algorithm, which could be more easily applied across a range of different situations.

Concerning partial observability scenarios, our algorithm still requires knowledge of which agents completed a particular task, even if outside our controlled agent visibility region. Hence, in a real application, we would still require some hardware in addition to the agent sensors, such as radio transmitters connected to the boxes that must be collected. Removing this assumption in task-based ad-hoc teamwork under partial observability is one of the exciting potential avenues for future work.

Finally, an important implication, which highlights a limitation of our study, is: improving the knowledge of the ad-hoc agent about non-learning teammate types did not always lead to an improvement in performance. This outcome may suggest the classic benchmark problems might not be a good fit for evaluating methods that focus on the importance of accurate modelling of neighbour types. With such implications and potential discovery, new benchmarks might be proposed to further evaluate the community’s algorithms or the current ones refined.

9 Conclusion

In this work we have presented On-line Estimators for Ad-hoc Task Execution (OEATE), a new algorithm for estimating types and parameters of teammates, specifically designed for problems where there is a set of tasks to be completed in a scenario. By focusing on decentralised task execution, we are able to obtain lower error in parameter and type estimation than previous works, which leads to better overall performance.

We also study our algorithm theoretically, showing that it converges to zero error as the number of tasks increases (under some assumptions), and we experimentally verify that the error does decrease with the number of iterations. Our theoretical analysis also shows the importance of having parameter bags in our method, as it significantly decreases the computational complexity. We experimentally evaluated our algorithm in the level-based foraging and capture the prey domain. We are also able to consider a range of situations, increasing number of items, number of agents, scenario sizes, and number of types in our experiments. Additionally, we evaluated the impact of having an erroneous set of potential types, the impact of handling situations with partial observability of the scenarios and the impact of each component within OEATE  through a ablation study. We show that we could outperform the previous works with statistical significance in some of these cases. Furthermore, we find that our method scales better to an increasing number of agents in the environment, and is able to show robustness when tackling different scenarios or facing wrong types templates. This work opens the path to diverse studies regarding the improvement of ad-hoc teams through a task-based perspective and using an information-oriented approach.

For the interested readers who may want to explore and further extend this work, our source code, built on AdLeap-MAS simulator [ 16 ], is available at https://github.com/lsmcolab/adleap-mas/ .

Albrecht, S., Crandall, J., & Ramamoorthy, S. (2015). An empirical study on the practical impact of prior beliefs over policy types. In Proceedings of the 29th AAAI conference on artificial intelligence .

Albrecht, S., & Stone, P. (2017). Reasoning about hypothetical agent behaviours and their parameters. In Proceedings of the 16th international conference on autonomous agents and multiagent systems, AAMAS’17 , May 2017.

Albrecht, S. V., & Ramamoorthy, S. (2016). Exploiting causality for selective belief filtering in dynamic bayesian networks. Journal of Artificial Intelligence Research , 55 .

Albrecht, S. V., & Ramamoorthy, S. (2013). A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems. Technical report, The University of Edinburgh, February 2013.

Albrecht, S. V., & Stone, P. (2018). Autonomous agents modelling other agents: A comprehensive survey and open problems. Artificial Intelligence, 258, 66–95.

Article   MathSciNet   MATH   Google Scholar  

Barrett, S., & Stone, P. (2015). Cooperating with unknown teammates in complex domains: A robot soccer case study of ad hoc teamwork. In Proceedings of the 29th AAAI conference on artificial intelligence .

Barrett, S., Stone, P., Kraus, S., & Rosenfeld, A. (2013). Teamwork with limited knowledge of teammates. In Proceedings of the 27th AAAI conference on artificial intelligence .

Barrett, S., Rosenfeld, A., Kraus, S., & Stone, P. (2017). Making friends on the fly: Cooperating with new teammates. Artificial Intelligence, 242, 132–171.

Barrett, S., & Stone, P. (2012). An analysis framework for ad hoc teamwork tasks. In Proceedings of the 11th international conference on autonomous agents and multiagent systems , Vol. 1, AAMAS ’12 (pp. 357–364), Richland, SC, 2012. International Foundation for Autonomous Agents and Multiagent Systems.

Barrett, S., Stone, P., & Kraus, S. (2011). Empirical evaluation of ad hoc teamwork in the pursuit domain. In Proceedings of the 11th International conference on autonomous agents and multiagent systems .

Berman, S., Halasz, A., Hsieh, M. A., & Kumar, V. (2009). Optimized stochastic policies for task allocation in swarms of robots. IEEE Transactions on Robotics , 25 (4).

Chandrasekaran, M., Doshi, P., Zeng, Y., & Chen, Y. (2014). Team behavior in interactive dynamic influence diagrams with applications to ad hoc teams. arXiv preprint arXiv:1409.0302.

Chen, S., Andrejczuk, E., Irissappane, A. A., & Zhang. J. (2019). Atsis: Achieving the ad hoc teamwork by sub-task inference and selection. In Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI-19 (pp. 172–179). International Joint Conferences on Artificial Intelligence Organization.

Claes, D., Robbel, P., Oliehoek, F., Tuyls, K., Hennes, D., & Van der Hoek, W. (2015). Effective approximations for multi-robot coordination in spatially distributed tasks. In Proceedings of the 14th international conference on autonomous agents and multiagent systems (AAMAS 2015) (pp. 881–890). International Foundation for Autonomous Agents and Multiagent Systems.

Czechowski, A., & Oliehoek, F. A. (2020). Decentralized mcts via learned teammate models. arXiv preprint arXiv:2003.08727.

do Carmo Alves, M. A., Varma, A., Elkhatib, Y., & Soriano Marcolino, L. (2022). AdLeap-MAS: An open-source multi-agent simulator for ad-hoc reasoning. In International conference on autonomous agents and multiagent systems (AAMAS)—Demo track .

Doshi, P., Zeng, Y., & Chen, Q. (2009). Graphical models for interactive POMDPs: Representations and solutions. JAAMAS, 18 (3), 376–416.

Google Scholar  

Eck, A., Shah, M., Doshi, P., & Soh, L.-K. (2019). Scalable decision-theoretic planning in open and typed multiagent systems. In Proceedings of the thirty-fourth AAAI conference on artificial intelligence AAAI .

Gmytrasiewicz, P., & Doshi, P. (2005). A framework for sequential planning in multiagent settings. JAIR, 24, 49–79.

Article   MATH   Google Scholar  

Guez, A., Silver, D., & Dayan, P. (2013). Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search. Journal of Artificial Intelligence Research (JAIR) , 48 .

Hart, P. E., Nilsson, N. J., & Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4 (2), 100–107.

Article   Google Scholar  

Hayashi, A., Ruiken, D., Hasegawa, T., & Goerick, C. (2020). Reasoning about uncertain parameters and agent behaviors through encoded experiences and belief planning. Artificial Intelligence, 280, 103228.

Hoang, T. N., & Low, K. H. (2013). Interactive POMDP lite: Towards practical planning to predict and exploit intentions for interacting with self-interested agents. In Proceedings of the twenty-third international joint conference on artificial intelligence, IJCAI .

Holland, J. H. (1992). Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control and artificial intelligence . Cambridge, MA: MIT Press.

Book   Google Scholar  

Kaelbling, L. P. Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence , 101 (1-2):99–134.

Kocsis, L., & Szepesvári, C. (2006). Bandit based Monte-Carlo planning. In Proceedings of the 17th European conference on machine learning .

Lerman, K., Jones, C., Galstyan, A., & Matarić, M. J. (2006). Analysis of dynamic task allocation in multi-robot systems. The International Journal of Robotics Research , 25 (3):225–241.

Matarić, M. J., Sukhatme, G. S., & Østergaard, E. H. (2003). Multi-robot task allocation in uncertain environments. Autonomous Robots, 14 (2–3), 255–263.

Melo, F. S., & Sardinha, A. (2016). Ad hoc teamwork by learning teammates’ task. Autonomous Agents and Multi-Agent Systems, 30 (2).

Nair, R., & Tambe, M. (2005). Hybrid BDI-POMDP framework for multiagent teaming. JAIR, 23, 367–413.

Nair, R., Varakantham, P., Yokoo, M., & Tambe, M. (2005). Networked distributed POMDPs: A synergy of distributed constraint optimization and POMDPs. In Proceedings of the nineteenth international joint conference on artificial intelligence, IJCAI .

Pelcner, L., Li, S., Do Carmo Alves, M., Marcolino, L. S., & Collins, A. (2020). Real-time learning and planning in environments with swarms: A hierarchical and a parameter-based simulation approach. In Proceedings of the 19th international conference on autonomous agents and multiagent systems, AAMAS .

Rabinowitz, N., Perbet, F., Song, F., Zhang, C., Eslami, S. M. A., & Botvinick, M. (2018). Machine theory of mind. In Jennifer, D., & Krause, A. (eds.,) Proceedings of the 35th international conference on machine learning , volume 80 of ICML (pp. 4218–4227).

Rahman, A., Hopner, N., Christianos, F., & Albrecht, S. V. (2020). Open ad hoc teamwork using graph-based policy learning. arXiv preprint arXiv:2006.10412.

Scerri, P., Pynadath, D., & Tambe, M. (2002). Towards adjustable autonomy for the real-world. JAIR, 17, 171–228.

Shafipour Yourdshahi, E., Do Carmo Alves, M., Marcolino, L. S., & Angelov, P. (2020). On-line estimators for ad-hoc task allocation: Extended abstract. In Proceedings of the 19th international conference on autonomous agents and multiagent systems, AAMAS .

Silver, D., & Veness, J. (2010). Monte-Carlo planning in large POMDPs. In Proceedings of the twenty-fourth annual conference on neural information processing systems .

Stone, P., Kaminka, G. A., Kraus, S., & Rosenschein, J. S. et al. (2010). Ad hoc autonomous agent teams: Collaboration without pre-coordination. In AAAI .

Trivedi, M., & Doshi, P. (2018). Inverse learning of robot behavior for collaborative planning. In Proceedings of the 2018 IEEE/RSJ international conference on intelligent robots and systems, IROS .

Wei, C., Hindriks, K. V., & Jonker, C. M. (2016). Dynamic task allocation for multi-robot search and retrieval tasks. Applied Intelligence , 45 (2), 383–401.

Yourdshahi, E. S., Pinder, T., Dhawan, G., Marcolino, L. S., & Angelov, P. (2018). Towards large scale ad-hoc teamwork. In 2018 IEEE international conference on agents, ICA .

Download references

Acknowledgements

This research was supported by Lancaster University, which provided financial support with the FST Studentship program and access to its High-End Computing (HEC) Cluster. We especially thank Mike Pacey for his best assistance while using and setting the experiments into the cluster. We thank our colleagues Dr. Yehia Elkathib and Yuri Tavares Dos Passos, who provided comments that greatly improved this paper, besides kindly delivering us their insights and expertise that supported our research and improved our methodology. We gratefully thank the AUSPIN visit grant and all the financial support provided by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) under the Scientific Initiation program, project 2019/14791-6, which was essential to the development of this research and the collection of the preliminary results. Jó Ueyama and Leandro Marcolino would like to thank FAPESP (Grant ID 2013/07375-0) for funding part of his research project. Finally, we would also like to show our gratitude to the University of São Paulo and its staff for supporting the first steps of this research, creating the necessary connections and providing the available support to achieve the current result.

Author information

Authors and affiliations.

Lancaster University, Lancaster, UK

Elnaz Shafipour Yourdshahi

Indian Institute of Technology Delhi, New Delhi, India

Matheus Aparecido do Carmo Alves, Leandro Soriano Marcolino & Plamen Angelov

University of São Paulo, São Carlos, SP, Brazil

Amokh Varma

University of Southampton, Southampton, UK

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Leandro Soriano Marcolino .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Elnaz Shafipour Yourdshahi and Matheus Aparecido do Carmo Alves are first authors.

This paper is an extended version of an AAMAS short paper (extended abstract) [ 36 ].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Shafipour Yourdshahi, E., do Carmo Alves, M.A., Varma, A. et al. On-line estimators for ad-hoc task execution: learning types and parameters of teammates for effective teamwork. Auton Agent Multi-Agent Syst 36 , 45 (2022). https://doi.org/10.1007/s10458-022-09571-9

Download citation

Accepted : 21 June 2022

Published : 13 August 2022

DOI : https://doi.org/10.1007/s10458-022-09571-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Ad-hoc teamwork
  • Decentralised task execution
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. Writing A Journal Article Review (New)

    ad hoc journal article review

  2. Article Review

    ad hoc journal article review

  3. Journal Article Review Template

    ad hoc journal article review

  4. How to review a journal article

    ad hoc journal article review

  5. 🌷 How to write an academic journal article review. How To Write A

    ad hoc journal article review

  6. Journal Article Review Template

    ad hoc journal article review

COMMENTS

  1. peer review

    5. Ad hoc might be used in the context of conference review where reviews are usually done by the members of a Program Committee and an extra (usually a specialized expert) reviewer is brought in for one or two papers which require careful review beyond the capability or capacity of the PC members. I haven't seen it used in the context of a ...

  2. Peer Review Process

    Ad hoc reviewers: This type of reviewer is outside of the journal's or journal section's editorial board and volunteers his/her time to submit comments for a single manuscript. The ad hoc reviewer might be invited to submit comments on a subsequent revised manuscript, if the manuscript enters an additional round of review.

  3. Ad Hoc Networks

    About the journal. The Ad Hoc Networks is an international and archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in ad hoc and sensor networking areas. The Ad Hoc Networks considers original, high quality and unpublished contributions addressing all …. View full aims & scope.

  4. How should I state my peer review experience in the CV?

    A better solution might be that if you are going to list what journals you have reviewed for is to list how many articles you have reviewed for a journal if you have reviewed more than one. It is probably enough to just say "a reviewer for ACM SigPlan" or whatever. Joshua and @Erwan Why should the editor be upset if I have mentioned that I have ...

  5. On Becoming a Peer Reviewer for a Neuropsychology Journal

    The current article will review some benefits of being an ad hoc reviewer for journals, and outline some points to keep in mind while conducting reviews. Why Become an Ad Hoc Reviewer? There are many reasons to serve as an ad hoc reviewer for scientific journals. It may seem a truism that volunteerism is its own reward, but nevertheless the ...

  6. Acknowledgment of Ad Hoc Reviewers, 2020

    For more information view the Sage Journals article sharing page. Information, rights and permissions Information Published In. ... Acknowledgment of Ad Hoc Reviewers. Show details Hide details. Journal of Management. Dec 2011. ... Review of General Psychology. Dec 2018. View more. Sage recommends: SAGE Knowledge. Entry .

  7. What to Expect in Peer Review

    Manuscripts submitted to the ASHA journals go through an editorial board peer review model. In this model, an editor-in-chief (EIC) is responsible for assigning each manuscript to an editor who has the appropriate content expertise. The editor assigns typically two to three reviewers who are editorial board members (EBMs) or one EBM and one ad ...

  8. Acknowledgment of Ad Hoc Reviewers, 2019

    The editorial team would gratefully like to acknowledge those who provided reviews in 2018 and for their contribution to the review process. Their efforts—along with those of our authors, editorial board, and associate editor team—ensure that the Journal of Management continues to publish impactful, top-quality research on a diverse array of important management topics.

  9. Ad Hoc Reviewers, 2019, 2020

    State and Local Government Review 2021 52: 3, 155-157 Share. Share. Social Media; Email; Share Access; ... For more information view the SAGE Journals Article Sharing page. ... Ad Hoc Reviewers, 2019 Show all authors. First Published March 5, 2021 Other.

  10. Guide for authors

    The Ad Hoc Networks is an international and archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in ad hoc and sensor networking areas. The Ad Hoc Networks considers original, high quality and unpublished contributions addressing all aspects of ad hoc and sensor networks. Specific ...

  11. Ad-Hoc Reviewers from 2021

    The Journal of Nonverbal Behavior greatly appreciates the assistance of the Editorial Board as well as the following ad hoc manuscript reviewers: Caspar Addyman. Cameron Anderson. Pablo Arias. Toe Aung. Stephen Benning. R. Thora Bjornsdottir. Molly Bowdring.

  12. 2023 Acknowledgment of JB Ad Hoc Reviewers

    EDITORIAL. Every year, ad hoc reviewers make critical contributions to the Journal of Bacteriology (JB) by reviewing papers in their field of expertise. This year, we acknowledge 299 such reviewers who reviewed at least once between 23 September 2022 and 31 October 2023. That is a lot of reviewers and many, many reviews!

  13. Classification and comparison of ad hoc networks: A review

    Heterogenous ad hoc network mobile devices like phones, laptops, tablets, and MacBooks have wireless transceivers and computational power. It acts as a router for data transfer and distant communication. Ad-hoc networks have mobility management, battery management, route finding, and maintenance challenges. 2022.

  14. Optimizing Path Selection in Dynamic Autonomous Mobile Ad Hoc Networks

    The dynamic autonomous mobile ad hoc networks (MANETs) allow nodes to move everywhere quickly and communicate without the support of a network infrastructure. ... Journal of Optimization. Volume 2024, Issue 1 1769128. Research Article. Open Access. ... The Literature Review. The dynamic nature of nodes in ad hoc networks (MANETs) results in ...

  15. Mobile Ad Hoc Networks: Recent Advances and Future Trends

    For this Special Issue, we invite submissions from all areas relating to the applications and challenges of mobile ad hoc networking, recent advances, and future trends. Contributions must relate to at least one of the following topics of interest: System design for mobile ad hoc networks. Applications of mobile ad hoc networks:

  16. Ad-hoc Reviewers for 2020, 2021

    For more information view the Sage Journals article sharing page. Information, rights and permissions Information Published In. Exceptional Children. Volume 88, Issue 1. ... 2016 Acknowledgement of Ad-Hoc Reviewers. Show details Hide details. Child Maltreatment. Oct 2016. Free access. Editorial. Show details Hide details. Douglas C. Strohmer.

  17. Vehicular Ad-hoc Network (VANET)

    Vehicular Ad-hoc Network (VANET) - A Review. Abstract: This paper explores VANET topics: architecture, characteristics, security, routing protocols, applications, simulators, and 5G integration. We update, edit, and summarize some of the published data as we analyze each notion. For ease of comprehension and clarity, we give part of the data ...

  18. Classification and comparison of ad hoc networks: A review

    abstract. The study of ad hoc networks and their different varieties, including wireless sensor networks, wireless. mesh networks, and mobile ad hoc networks, is discussed in this paper. Wireless ...

  19. Ad hoc digital communication and assessment during clinical placements

    Background There was a concern about the shortage of nurses that resulted from the Covid-19 pandemic. Therefore, universities and university colleges were instructed to continue educating nursing professionals but were challenged by the social distancing and the limitations of clinical placements and clinical-field instructors. Clinical placement is essential in the students' development of ...

  20. On-line estimators for ad-hoc task execution: learning types and

    The literature introduces ad-hoc teamwork as a remarkable approach to handle multi-agents systems [5, 38].This approach presents the opportunity to achieve the objectives of the multiple agents in a collaborative manner that surpasses the requirement of designing a communication channel for information exchanging between the agents, building an application to do prior coordination or the ...

  21. Ad Hoc Reviewers, 2018, 2019

    Also from SAGE Publishing. CQ Library American political resources opens in new tab; Data Planet A universe of data opens in new tab; Lean Library Increase the visibility of your library opens in new tab; SAGE Business Cases Real-world cases at your fingertips opens in new tab; SAGE Campus Online skills and methods courses opens in new tab; SAGE Knowledge The ultimate social science library ...

  22. Classification and comparison of ad hoc networks: A review

    The study of ad hoc networks and their different varieties, including wireless sensor networks, wireless mesh networks, and mobile ad hoc networks, is discussed in this paper. Wireless devices link to one another directly in an ad hoc network without the use of a central Wireless Access Point (WAP). Data transit between wireless devices is frequently managed and directed by a base station or ...