简体   繁体   English

Pandas DataFrame列分离

[英]Pandas DataFrame columns separating

I have a large DataFrame, in which there is only one column with all the values. 我有一个大型DataFrame,其中只有一列包含所有值。 I need to separate the data into more columns. 我需要将数据分成更多列。 After a lot of trial and error, I gave up and sought to your help. 经过大量的反复试验,我放弃了并寻求你的帮助。

the head of the DataFrame looks like this: the rows are a Series object. DataFrame的头部如下所示:行是一个Series对象。 not values 不是价值观

                                                        column1
    ---------------------------------------------------------------------
    MultiIndex1  | 1.00   2.00   3.00   4.00   5.00   6.00   7.00
                 | 1.00   2.00   3.00   4.00   5.00   6.00   7.00
                 | 1.00   2.00   3.00   4.00   5.00   6.00   7.00
                 | 1.00   2.00   3.00   4.00   5.00   6.00   7.00
                 | 1.00   2.00   3.00   4.00   5.00   6.00   7.00
                 | 1.00   2.00   3.00   4.00   5.00   6.00   7.00

my desired output should look like this: 我想要的输出应该是这样的:

                 column1|column2|column3|column4|column5|column6|column7
    ---------------------------------------------------------------------
    MultiIndex1  | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
                 | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
                 | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
                 | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
                 | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
                 | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00

i've tried to: df.columns = ['col1','col2','col3','col4','col5'...] 我试过:df.columns = ['col1','col2','col3','col4','col5'......]

i've tried turning it into a series and back to a df. 我试过把它变成一个系列并回到df。

tried applying .str.split functions. 尝试应用.str.split函数。

lots of slicing and concat, but no success. 很多切片和连接,但没有成功。

help would be much appreciated. 帮助将不胜感激。 Thanks! 谢谢!

here is the first few lines of my dataset, as an example: 这是我的数据集的前几行,作为一个例子:

the date and AALR3 are the Row MultiIndex 日期和AALR3是行MultiIndex

2019-01-02;AALR3 ;0000000020; 2019-01-02; AALR3; 0000000020; 000000000013.300000;000000000000000100;10:00:04.961;1;2019-01-02;000086597137782;000000000310091;2;2019-01-02;000086597142909;000000000310092;1;0;00000072;00000174 2019-01-02;AALR3 ;0000000010; 000000000013.300000; 000000000000000100; 10:00:04.961; 1; 2019年1月2日; 000086597137782; 000000000310091; 2; 2019年1月2日; 000086597142909; 000000000310092; 1; 0; 00000072; 00000174 2019年1月2日; AALR3; 0000000010 ; 000000000013.310000;000000000000003000;10:00:04.961;1;2019-01-02;000086597135827;000000000310088;2;2019-01-02;000086597142909;000000000310089;1;0;00000120;00000174 2019-01-02;AALR3 ;0000000050; 000000000013.310000; 000000000000003000; 10:00:04.961; 1; 2019年1月2日; 000086597135827; 000000000310088; 2; 2019年1月2日; 000086597142909; 000000000310089; 1; 0; 00000120; 00000174 2019年1月2日; AALR3; 0000000050 ; 000000000013.390000;000000000000000200;10:11:40.214;1;2019-01-02;000086597182855;000000000400273;1;2019-01-02;000086597151579;000000000400274;2;0;00000058;00000008 2019-01-02;AALR3 ;0000000040; 000000000013.390000; 000000000000000200; 10:11:40.214; 1; 2019年1月2日; 000086597182855; 000000000400273; 1; 2019年1月2日; 000086597151579; 000000000400274; 2; 0; 00000058; 00000008 2019年1月2日; AALR3; 0000000040 ; 000000000013.380000;000000000000000100;10:11:40.214;1;2019-01-02;000086597182855;000000000400271;1;2019-01-02;000086597151578;000000000400272;2;0;00000058;00000174 2019-01-02;AALR3 ;0000000030; 000000000013.380000; 000000000000000100; 10:11:40.214; 1; 2019年1月2日; 000086597182855; 000000000400271; 1; 2019年1月2日; 000086597151578; 000000000400272; 2; 0; 00000058; 00000174 2019年1月2日; AALR3; 0000000030 ; 000000000013.380000;000000000000000100;10:11:40.214;1;2019-01-02;000086597182855;000000000400269;1;2019-01-02;000086597151189;000000000400270;2;0;00000058;00000308 000000000013.380000; 000000000000000100; 10:11:40.214; 1; 2019年1月2日; 000086597182855; 000000000400269; 1; 2019年1月2日; 000086597151189; 000000000400270; 2; 0; 00000058; 00000308

im reading it with: 我正在读它:

    pd.read_csv('//path_to_file', sep=';')

I want to name the columns like this. 我想命名这样的列。

    df.columns = ['Session Date','Instrument Symbol','Trade Number','Trade Price','Traded Quantity',
          'Trade Time','Trade Indicator','Buy Order Date','Sequential Buy Order Number',
          'Secondary Order ID - Buy Order','Aggressor Buy Order Indicator','Sell Order Date',
         'Sequential Sell Order Number','Secondary Order ID - Sell Order','Aggressor Sell Order Indicator',
          'Cross Trade Indicator','Buy Member','Sell Member']

UPDATE: 更新:

the solutions were effective, thank you very much. 解决方案很有效,非常感谢。

I is almost the way i want it. 我几乎是我想要它的方式。 Is there a way to make the duplicate indexes a MultiIndex as well? 有没有办法让重复索引成为MultiIndex? I managed to make the dates, but not the symbol. 我设法制作日期,但不是符号。 Thanks 谢谢

Make a try with this- 试试这个 -

your_df = pd.DataFrame(df.column1.str.split(' ',1).tolist(), columns = ['col1','col2','col3','col4','col5','col6','col7'])
print(your_df)

What you are seeing is a MultiIndex Dataframe , and what you are looking for a SingleIndex dataframe , Try 你们看到的是一个MultiIndex Dataframe ,和你正在寻找一个SingleIndex dataframe ,尝试

df = df.reset_index()
df.columns = ['col1','col2','col3','col4','col5','col6','col7']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM