python/access/pandas - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Forum & Off Topic (https://python-forum.io/forum-23.html) +--- Forum: Bar (https://python-forum.io/forum-27.html) +--- Thread: python/access/pandas (/thread-41099.html) |
python/access/pandas - DPaul - Nov-11-2023 Hi, I was handed an .mdb. Oldtimers in the centre say that it was started at the very least 25 years ago. Over the years various people have entered data through a "form". Access version is long gone, the only communication with the outside world is a monthly report. I can handle .accdb with python. however .mdb = problems. Surprisingly, modern excel will open the mdb and show a nice spreadsheet with rows and colums. It only looks nice. I suspect users have entered long texts into the entry boxes of the "form", and used CRLF, when they should not have. When saving the spreadsheet "as csv", the records are messed up. (It's an inventory of old parochial registers) Hence my question, after having read the excel with pandas, and turned it into a dataframe "bib": for idx,row in bib.iterrows(): title= row['title'] etc..Can I eliminate any CRLF from a dataframe field, before saving it to a txt file, or directly into sql. (I could turn the data into ascci numbers and check for special chars, but there must be abetter solution.) thx, Paul EDIT: I have discovered the culprit. It seems that you can do : bib.replace('\n','', regex=True). But now when I save to txt, I find _X000D_ instead of '' (nothing) What is _X000D_ ? Paul RE: python/access/pandas - buran - Nov-11-2023 https://www.compart.com/en/unicode/U+000D RE: python/access/pandas - DPaul - Nov-11-2023 (Nov-11-2023, 12:44 PM)buran Wrote: https://www.compart.com/en/unicode/U+000DYes, it is a CR. I found that" \n" is replaced by " _X000D_ ", in the text, but not removed. Removing "_X000D_" with python, after the replacement of '\n', seems a tall order. Very confusing info on the internet. Maybe Unicode helps. There is always the kiss method. In this case ctrl-H on the final txt file. Replace "_X000D_" with nothing. Not a very pythonesk solution, until I have tested with the CR unicode equivalent. Thanks, Paul |