简体   繁体   English

如何在一个文件python中合并多个数据帧

[英]how to merge multiple dataframes inside one file, python

In my code i have received result like this one: 在我的代码中,我收到了这样的结果:

A B C
1 1 1
A B C
2 2 2
A B C
3 3 3

I need to merge those columns (dataframes) to one big dataframe like 我需要将这些列(数据帧)合并到一个大数据帧中

 A B C
 1 1 1
 2 2 2
 3 3 3

To merge dataframes from different files its ease like pd.merge(df1,df2) but how to do it when dataframes are in one file? 要合并来自不同文件的数据帧,就像pd.merge(df1,df2)那样pd.merge(df1,df2)但是当数据帧在一个文件中时如何做呢? Thanks in advice! 谢谢你的建议!

EDIT: to receive my data i converted the lines in my dataset to get dataframes, and i have received in one output each dataset for each line. 编辑:接收我的数据我转换了我的数据集中的行来获取数据帧,我已经在一个输出中收到每行的每个数据集。 my code: 我的代码:

def coordinates():
    with open('file.txt') as file:
        for lines in file:
            lines =StringIO(lines[35:61]) #i need only those fields in each line
            abc=pd.read_csv(lines,sep=' ',header=None)
            abc.columns=['A', 'B', 'C','D','E','F']
            print abc

coordinates()

EDIT2: Proposition from s_vishnu its only good for prapared file with same multiple headers. EDIT2:来自s_vishnu的命题仅对具有相同多个标头的prapared文件有用。 But in my case i have multiple DataFrames generated to the file and each line after header have 0 value. 但在我的情况下,我为文件生成了多个DataFrames,并且标题后面的每一行都有0值。 It's many dataframes and each have only one line. 这是许多数据帧,每个只有一行。

EDIT3: in my file.txt i have big amount of lines with about 80 letters in line like this: EDIT3:在我的file.txt我有大量的行,大约有80个字母,如下所示:

AAA SS SSDAS ASDJAI A 234 33 43 234 2342999 2.31 22 33 SSS SD W2UUQ Q231WQ A 222 11 23 123 1231299 2.31 22 11

and from those line i need only part of information so thats why i did lines =StringIO(lines[35:61]) to take this info. 从那些线我只需要部分信息,这就是为什么我做lines =StringIO(lines[35:61])来获取这些信息。 In this example i will need letters [30:55] and create dataframe with them with columns=['A', 'B', 'C','D','E','F'] with sep=' ' 在这个例子中,我将需要字母[30:55]columns=['A', 'B', 'C','D','E','F'] with sep=' '创建带有columns=['A', 'B', 'C','D','E','F'] with sep=' '数据帧

my_test.csv : my_test.csv

A, B, C
1, 1 ,1
A, B, C
2, 2, 2
A, B, C
3, 3, 3

Use list slicing . 使用列表切片

import pandas as pd
df = pd.read_csv("my_test.csv")
df=df[::2]
print(df)

output: 输出:

   A    B   C
0  1   1    1
2  2    2   2
4  3    3   3

df=df[::2] This is advanced list slicing. df = df [:: 2]这是高级列表切片。 Where in df[::2] the 2 means starting from 0 increment by 2 step. df[::2] ,2表示从0增加到2步。

But note the index values. 请注意索引值。 They too are in steps of 2. ie 0,2,4,.. to change the index just do this. 他们也是2的步骤,即0,2,4,..改变索引只是这样做。

import pandas as pd
df = pd.read_csv("my_test.csv")
df=df[::2]

df.index = range(len(df['A']))
print(df)

output: 输出:

   A    B   C
0  1   1    1
1  2    2   2
2  3    3   3

So you get the values you desire. 所以你得到了你想要的价值。

I have found the solution, I've changed the code at the beginning and that was helpfull: 我找到了解决方案,我在开始时更改了代码,这很有帮助:

def coordinates():
abc=open('file.txt')
lines=abc.readlines()
        for line in lines:
        abc2=line[20:-7] #i just cut the lines from the begining and from the end, and i dont need to take data from the middle
        abc3=abc2.split()
        pd.DataFrame(abc3) 
        print abc3

coordinates()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM