[英]Reading one-column file with pandas
I'm trying to read the following file into a pandas dataframe:我正在尝试将以下文件读入 pandas dataframe:
(dataA
0
400
2800
9200
5600
2000
8400
4800
1200
7600
4000
400
6800
)
(dataB
30
30
30
30
30
30
20
500
30
50
330
530
930
)
The objective being to have something as this:目标是拥有这样的东西:
dataA dataB
0 30
400 30
2800 30
9200 30
5600 30
2000 30
8400 20
4800 500
1200 30
7600 50
4000 330
400 530
6800 930
I know this can be done by reading the file line by line, but I was wondering if there is an easy way to have it read by pandas (as read_csv for example).我知道这可以通过逐行读取文件来完成,但我想知道是否有一种简单的方法可以让 pandas 读取它(例如 read_csv)。 This is because there are lots of files similar to this one and the post-processing is already automatized for that type of data.
这是因为有很多与此类似的文件,并且已针对该类型的数据自动进行后处理。
based on the fact that you have parethisis that break the columns apart we can create two new indexes and unstack your columns.基于你有 parethisis 将列分开的事实,我们可以创建两个新索引并拆开你的列。
It's important you read your file with header=None
使用
header=None
阅读文件很重要
df = pd.read_excel(...,header=None)
s = df[0].str.contains('\(',regex=True)
df1 = df.set_index([s.cumsum(), df.groupby(s.cumsum()).cumcount()]).unstack(0)
#additional clean up
df1 = df1.replace('\(|\)','',regex=True).replace('',np.nan).dropna().droplevel(0,1)
#setup columns.
df1.columns = df1.iloc[0]
df1 = df1.iloc[1:]
print(df1)
0 dataA dataB
1 0 30
2 400 30
3 2800 30
4 9200 30
5 5600 30
6 2000 30
7 8400 20
8 4800 500
9 1200 30
10 7600 50
11 4000 330
12 400 530
13 6800 930
Import pandas library:导入pandas库:
import pandas as pd
Create a dictionary from your list:从列表中创建字典:
data = { 'dataA': [0,400,2800,9200,5600,2000,8400,4800,1200,7600,4000,400,6800], 'dataB': [30,30,30,30,30,30,20,500,30,50,330,530,930]}
Create your dataframe:创建您的 dataframe:
df = pd.DataFrame(data)
Call your data frame:调用您的数据框:
df
Overally, you can see the total code:综上,可以看到总代码:
import pandas as pd
data = { 'dataA': [0,400 ,2800,9200,5600,2000,8400,4800,1200,7600,4000,400,6800],
'dataB': [30,30,30,30,30,30,20,500,30,50,330,530,930]}
df = pd.DataFrame(data)
df
and the output will be: output 将是:
dataA dataB
0 0 30
1 400 30
2 2800 30
3 9200 30
4 5600 30
5 2000 30
6 8400 20
7 4800 500
8 1200 30
9 7600 50
10 4000 330
11 400 530
12 6800 930
If you are willing to not see the number of rows in your dataframe, add this code at the end:如果您不想看到您的 dataframe 中的行数,请在末尾添加此代码:
print(df.to_string(index=False))
The output will be: output 将是:
dataA dataB
0 30
400 30
2800 30
9200 30
5600 30
2000 30
8400 20
4800 500
1200 30
7600 50
4000 330
400 530
6800 930
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.