简体   繁体   中英

[Pandas, Python]; Retain Empty Columns in Space Separated Data Frame

I have a data frame with values like below

A B C D
1 2 3 4
5   6 7
8     9

When i read the above frame into Pandas using the below

pd.read_csv(io.StringIO(raw_2), sep='\s+')

It is read as

A B C   D
1 2 3   4
5 6 7   NaN
8 9 NaN NaN

Is there a way i can retain the blank columns and have the 9 under column D instead of B

You need a reader that reads fixed-width columns:

pd.read_fwf(io.StringIO(raw_2))
#   A    B    C  D
#0  1  2.0  3.0  4
#1  5  NaN  6.0  7
#2  8  NaN  NaN  9

This procedure is not guaranteed to work in general. You may have to specify the columns widths by hand.

You can use:

pd.read_csv(io.StringIO(raw_2), sep=r'\s{1,2}')

    A   B   C   D
0   1   2.0 3.0 4
1   5   NaN 6.0 7
2   8   NaN NaN 9

Which uses the regex pattern \\s{1,2} as the separator. This regex matches 1-or-2 whitespace characters.

\\s{1,2} matches any whitespace character (equal to [\\r\\n\\t\\f\\v ])

{1,2} Quantifier — Matches between 1 and 2 times, as many times as possible, giving back as needed

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM