I have the \\x02\\n
as a line terminator in a csv file I'm trying to parse. However, I cannot use two characters in pandas, it only allows one, for example:
>>> data = pd.read_csv(file, sep="\x01", lineterminator="\x02")
>>> data.loc[100].tolist()
['\n1475226000146', '1464606', 'Juvenile', '1', 'http://itunes.apple.com/artist/juvenile/id1464606?uo=5', '1']
Or:
data = pd.read_csv(file, sep="\x01", lineterminator="\n")
>>> data.loc[100].tolist()
['1475226000146', '1464606', 'Juvenile', '1', 'http://itunes.apple.com/artist/juvenile/id1464606?uo=5', '1\x02']
Here we can see that the \\n
hasn't been chopped off correctly. What would be the best way to read the csv file in pandas with the above separator?
As of v0.23, pandas does not support multi-character line-terminators. Your code currently returns:
s = "this\x01is\x01test\x02\nthis\x01is\x01test2\x02"
df = pd.read_csv(
pd.compat.StringIO(s), sep="\x01", lineterminator="\x02", header=None)
df
0 1 2
0 this is test
1 \nthis is test2
Your only option (as of now) is to remove the leading whitespace from the first column. You can do this with str.lstrip
.
df.iloc[:, 0] = df.iloc[:, 0].str.lstrip()
# Alternatively,
# df.iloc[:, 0] = [s.lstrip() for s in df.iloc[:, 0]]
df
0 1 2
0 this is test
1 this is test2
If you have to handle stripping of multiple other kinds of line-terminators (besides just the newline), you can pass a string of them:
line_terminators = ['\n', ...]
df.iloc[:, 0] = df.iloc[:, 0].str.lstrip(''.join(line_terminators))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.