papers_without_code package#

Subpackages#

Submodules#

papers_without_code.custom_types module#

class papers_without_code.custom_types.AuthorDetails(name_parts: list[str], email: str | None, affiliation: str | None)[source]#

Bases: object

affiliation: str | None#
email: str | None#
name_parts: list[str]#
class papers_without_code.custom_types.MinimalPaperDetails(title: str, authors: list[papers_without_code.custom_types.AuthorDetails], abstract: str, url: str | None = None, keywords: list[str] | None = None, other: dict[str, Any] | None = None)[source]#

Bases: object

abstract: str#
authors: list[AuthorDetails]#
keywords: list[str] | None = None#
other: dict[str, Any] | None = None#
title: str#
url: str | None = None#

papers_without_code.grobid module#

papers_without_code.processing module#

papers_without_code.processing.parse_grobid_data(grobid_data: dict[str, Any]) MinimalPaperDetails[source]#

Parse GROBID data into a bit more useful form.

Parameters:
grobid_data: Dict[str, Any]

The data returned from GROBID after processing a PDF.

Returns:
MinimalPaperDetails

The parsed GROBID data.

papers_without_code.search module#

class papers_without_code.search.LLMKeywordResults(*, keywords: list[str])[source]#

Bases: BaseModel

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

__init__ uses __pydantic_self__ instead of the more common self for the first arg to allow self as a field name.

keywords: list[str]#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'keywords': FieldInfo(annotation=list[str], required=True, description='Extracted keyword sequences found in the text.')}#

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

class papers_without_code.search.RepoDetails(name: str, link: str, search_query: str, similarity: float, stars: int, forks: int, watchers: int, description: str)[source]#

Bases: DataClassJsonMixin

description: str#
forks: int#
name: str#
search_query: str#
similarity: float#
stars: int#
watchers: int#
class papers_without_code.search.RepoReadmeResponse(repo_name: str, search_query: str, readme_text: str, stars: int, forks: int, watchers: int, description: str)[source]#

Bases: object

description: str#
forks: int#
readme_text: str#
repo_name: str#
search_query: str#
stars: int#
watchers: int#
class papers_without_code.search.SearchQueryDataTracker(query_str: str, strict: bool = False)[source]#

Bases: object

query_str: str#
strict: bool = False#
class papers_without_code.search.SearchQueryResponse(query_str: str, repo_name: str, stars: int, forks: int, watchers: int, description: str)[source]#

Bases: object

description: str#
forks: int#
query_str: str#
repo_name: str#
stars: int#
watchers: int#
papers_without_code.search.get_paper(query: str) MinimalPaperDetails[source]#

Get a papers details from the Semantic Scholar API.

Provide a DOI, SemanticScholarID, CorpusID, ArXivID, ACL, or URL from semanticscholar.org, arxiv.org, aclweb.org, acm.org, or biorxiv.org. DOIs can be provided as is. All other IDs should be given with their type, for example: doi:doi:10.18653/v1/2020.acl-main.447 or CorpusID:202558505 or url:https://arxiv.org/abs/2004.07180.

Parameters:
query: str

The structured paper to query for.

Returns:
Paper

The paper details.

Raises:
ValueErorr

No paper was found.

papers_without_code.search.get_repos(paper: MinimalPaperDetails, loaded_sent_transformer: SentenceTransformer | None = None) list[RepoDetails][source]#

Try to find GitHub repositories matching a provided paper.

Parameters:
paper: MinimalPaperDetails

The paper to try and find similar repositories to.

loaded_sent_transformer: Optional[SentenceTransformer]

An optional preloaded SentenceTransformer model to use instead of loading a new one. Default: None

Returns:
list[RepoDetails]

A list of repositories that are similar to the paper, sorted by each repositories README’s semantic similarity to the abstract (or title if no abstract was attached to the paper details).

Module contents#

Top-level package for papers_without_code.