Python Forum
python/access/pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
python/access/pandas
#1
Hi,
I was handed an .mdb. Oldtimers in the centre say that it was started
at the very least 25 years ago. Over the years various people have entered data
through a "form". Access version is long gone, the only communication
with the outside world is a monthly report.
I can handle .accdb with python. however .mdb = problems.
Surprisingly, modern excel will open the mdb and show a nice
spreadsheet with rows and colums. It only looks nice.
I suspect users have entered long texts into the entry boxes of the "form",
and used CRLF, when they should not have. When saving the spreadsheet "as csv",
the records are messed up. (It's an inventory of old parochial registers)
Hence my question, after having read the excel with pandas, and turned it into a dataframe "bib":
for idx,row in bib.iterrows():
            title= row['title']
etc..
Can I eliminate any CRLF from a dataframe field, before saving it to a txt file, or directly into sql.
(I could turn the data into ascci numbers and check for special chars, but there must be abetter solution.)
thx,
Paul
EDIT: I have discovered the culprit.
It seems that you can do : bib.replace('\n','', regex=True).
But now when I save to txt, I find _X000D_ instead of '' (nothing)
What is _X000D_ ?
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#2
https://www.compart.com/en/unicode/U+000D
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
(Nov-11-2023, 12:44 PM)buran Wrote: https://www.compart.com/en/unicode/U+000D
Yes, it is a CR.
I found that" \n" is replaced by " _X000D_ ", in the text, but not removed.
Removing "_X000D_" with python, after the replacement of '\n', seems a tall order.
Very confusing info on the internet. Maybe Unicode helps.
There is always the kiss method. In this case ctrl-H on the final txt file. Replace "_X000D_"
with nothing.
Not a very pythonesk solution, until I have tested with the CR unicode equivalent.
Thanks,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020