简体   繁体   中英

Space separated file to Pandas when values have spaces

I have a space separated text file. The first 3 columns include spaces in the values , but they have fixed width (7 characters).

Example:

A123456 B123456 C123456 12 158 325 0 14
D123456 E123456 F123456 1 147 23 711 0
G1 3456 H123456 F 23456 158 11 7 574 12589
J1234 6 K   456 L123456 1458 2 0.45 1 78

Desired output:

0 1 2 3 4 5 6 7
0 A123456 B123456 C123456 12 158 325 0
1 D123456 E123456 F123456 1 147 23 711
2 G1 3456 H123456 F 23456 158 11 7 574
3 J1234 6 K 456 L123456 1458 2 0.45 1

Can I read this file with pandas?

We can use pd.read_fwf to "Read a table of fixed-width formatted lines into DataFrame"

df = pd.read_fwf('data.txt', colspecs='infer', header=None)

df :

         0        1        2                   3
0  A123456  B123456  C123456     12 158 325 0 14
1  D123456  E123456  F123456      1 147 23 711 0
2  G1 3456  H123456  F 23456  158 11 7 574 12589
3  J1234 6  K   456  L123456    1458 2 0.45 1 78

Column 3 can be str.split on spaces if the rest of the frame is to be space separated:

df = pd.read_fwf('data.txt', colspecs='infer', header=None)
# Replace 3 with new columns
df = df.drop(3, axis=1).join(df[3].str.split(expand=True), rsuffix='_x')
# Rename columns
df.columns = range(len(df.columns))

df :

         0        1        2     3    4     5    6      7
0  A123456  B123456  C123456    12  158   325    0     14
1  D123456  E123456  F123456     1  147    23  711      0
2  G1 3456  H123456  F 23456   158   11     7  574  12589
3  J1234 6  K   456  L123456  1458    2  0.45    1     78

data.txt :

A123456 B123456 C123456 12 158 325 0 14
D123456 E123456 F123456 1 147 23 711 0
G1 3456 H123456 F 23456 158 11 7 574 12589
J1234 6 K   456 L123456 1458 2 0.45 1 78

You can use any of these: -

data = pd.read_csv('data.txt',
                   sep=";|:|,",
                   header=None,
                   engine='python')

Or use read_fwf

df = pd.read_fwf('data.txt', colspecs='infer', header=None)

This will write every value in a new column. Hope this could be helpful.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM