Langchain

standenman · Jun-07-2023, 06:02 PM

I am trying to use langchain to query a pdf document with chatgpt.

import os
import openai
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY', 'YourAPIKey')

# Initialize OpenAI API
openai.api_key = OPENAI_API_KEY

# Load the PDF documents
loader = PyPDFLoader("ALJDecision.pdf")

Don't really understand what is going wrong here.
data = loader.load()

print(f'You have {len(data)} document(s) in your data')

if len(data) >= 31:
    print(f'There are {len(data[30].page_content)} characters in your document')
else:
    print("Data does not have an element at index 30")

# Split the documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
texts = text_splitter.split_documents(data)

print(f'Now you have {len(texts)} documents')

# Index the documents
index = {}
for i, t in enumerate(texts):
    response = openai.Completion.create(
        engine="davinci",
        prompt=t.page_content,
        max_tokens=64,
        temperature=0.5,
        top_p=1.0,
        n=1,
        stop=None
    )
    text = response.choices[0].get("text")  # Get the generated text
    index[str(i)] = text

# Query the index
query = "Why did the judge deny this claim for social security disability?"
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "system", "content": "You are a helpful assistant."},
              {"role": "user", "content": query}],
    max_tokens=128,
    temperature=0.5,
    top_p=1.0,
    n=1,
    stop=None
)
query_text = response.choices[0].get("message").get("content")  # Get the generated text

# Perform the search
results = []
for doc_id, doc_content in index.items():
    if query_text in doc_content:
        results.append(doc_id)

# Print the results
for doc_id in results:
    print(doc_id)

Error:You have 19 document(s) in your data
Data does not have an element at index 30
Now you have 31 documents

**Larz60+** · Jun-07-2023, 10:10 PM

This looks like a question for longchain authors.
However, I don't see any contact info on their website.

You may still get an answer here from other longchain users.

Bolt · Jun-08-2023, 09:30 PM

Based on the code provided, it seems there might be an issue with the indexing and splitting of the PDF document. Here are a few points to consider:

1. PDF Loading: Make sure the file "ALJDecision.pdf" exists in the same directory as your script and that it is accessible.

2. Document Loading: The code is attempting to load the PDF document using the loader.load() method, but it seems to be returning only 19 documents instead of the expected 31. Verify that the PDF document contains the expected content and that it is structured in a way that can be correctly loaded and split.

3. Document Splitting: The RecursiveCharacterTextSplitter is used to split the loaded documents into smaller chunks. In the provided code, it is configured with a chunk_size of 2000 characters and chunk_overlap of 0. Ensure that these settings are appropriate for your specific document and use case.

4. Indexing: After splitting the documents, the code attempts to index them by sending each chunk to the OpenAI API for completion using the Davinci model. Make sure you have a valid API key set in the OPENAI_API_KEY variable.

5. Querying: The code uses the openai.ChatCompletion.create() method to perform a query on the indexed documents. Confirm that the messages parameter is correctly formatted, with the system message preceding the user's query.

To debug the issue further, you can print out the loaded documents and their content to check if they are being processed correctly. Additionally, review the API responses to see if there are any error messages or unexpected results.

It's also worth noting that the code seems to assume there will be exactly 31 documents and that the 31st document will have a page content to check its length. If your document doesn't have that structure, you might need to adjust the code accordingly.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Cannot import Langchain and openai.	standenman	2	5,253	May-22-2023, 03:00 PM Last Post: standenman

Langchain

User Panel Messages

Announcements