Read CSV file using Pandas: complex separator

Question

I have a csv file which I want to read using python panda. The header and lines looks the following:

 A           ^B^C^D^E  ^F          ^G           ^H^I^J^K^L^M^N

Clearly it seen that, separator is ^, sometimes there are some odd spaces. How can I read this file perfectly?

I am using the following command to read the csv file:

df = pd.read_csv('input.csv', sep='^')

Answer 1

Use regex \\s*\\^ which means 0 or more whitespace and ^, you have to specify the python engine here to avoid a warning about regex support:

In [152]:

t="""A           ^B^C^D^E  ^F          ^G           ^H^I^J^K^L^M^N"""
df= pd.read_csv(io.StringIO(t), sep='\s*\^', engine='python')
df.columns
Out[152]:
Index(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N'], dtype='object')

Answer 2

你不能作为分离器提供正则表达式吗？

sep = re.compile(r'[\^\s]+')

Answer 3

Your separator can be a regular expression, so try something like this:

df = pd.read_csv('input.csv', sep="[ ^]+")

The regular expression should use any number of spaces or carets (^) in a row as a single separator.

Answer 4

Read the file as you have done and then strip extra whitespace for each column which is a string:

df = (pd.read_csv('input.csv', sep="^")
      .apply(lambda x: x.str.strip() if isinstance(x, str) else x))

Answer 5

If the only whitespace in your file is the extra whitespace between columns (ie no columns have raw text with spaces), an easy fix would be to simply remove all the spaces in the file. An example command to do that would be:

<input.csv tr -d '[[:blank:]]' > new_input.txt

Read CSV file using Pandas: complex separator

Question

5 answers

solution1
8 ACCPTED 2015-05-14 22:09:28

solution2
4 2015-05-14 22:09:09

solution3
2 2015-05-14 22:08:02

solution4
0 2015-05-14 22:09:18

solution5
0 2015-05-14 22:09:52

Read CSV file using Pandas: complex separator

Question

5 answers

solution1 8 ACCPTED 2015-05-14 22:09:28

solution2 4 2015-05-14 22:09:09

solution3 2 2015-05-14 22:08:02

solution4 0 2015-05-14 22:09:18

solution5 0 2015-05-14 22:09:52

solution1
8 ACCPTED 2015-05-14 22:09:28

solution2
4 2015-05-14 22:09:09

solution3
2 2015-05-14 22:08:02

solution4
0 2015-05-14 22:09:18

solution5
0 2015-05-14 22:09:52