简体   繁体   中英

How to remove line breaks/new lines in csv with python

simple code 

import pandas as pd
df = pd.read_csv('problematic.csv', sep='|',quotechar='"')
pandas_df = df.replace({r'\\r': ''}, regex=True)
pandas_df = pandas_df.replace({r'\\n': ''}, regex=True)
print(pandas_df.head())

i have input INPUT -

ID  |  NAME   | VILLAGE  |   PENSION    -----HEADER

001 |  XYZ    |  RAMG    |    1500   -----ROW1


002 |  DINAL                        
      SHAMSUDH
      DHON 
|  SHIWA   |    2090

EXPECTED OUTPUT

ID  |  NAME  |   VILLAGE  |   PENSION    

001 |  XYZ    |  RAMG     |   1500

002 |  DINAL SHAMSUDH  DHON  |  SHIWA   |    2090

I suggest removing the redundant newline characters prior to reading the csv in pandas. You could do so by with by opening the file, reading it with readlines() , which will create a list of lines. Then you can remove the newline characters from every item in the list that does not contain three | characters:

from io import StringIO
import pandas as pd

with open('problematic.csv') as f:
  text = f.readlines()
  text = ' '.join([i.replace('\n', '').strip() if i.count('|') <3 else i for i in text])

df = pd.read_csv(StringIO(text), sep='|',quotechar='"')

Output:

ID NAME VILLAGE PENSION
0 1 XYZ RAMG 1500
1 2 DINAL SHAMSUDH DHON SHIWA 2090

Note that this example assumes that -----HEADER is not part of the csv file. If it is you can filter it out with replace() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM