I have a positional text file that has the related data split into two lines.
Column 1Column 2Column 3
Text
11 12 13
text for 1
21 22 23
text for 2
31 32 33
text for 3
41 42 43
text for 4
51 52 53
text for 5
I'm trying to get this into a dataframe like
Column 1Column 2Column 3 Text
11 12 13 text for 1
21 22 23 text for 2
31 32 33 text for 3
41 42 43 text for 4
51 52 53 text for 5
I'm testing without the column headers
import pandas as pd
cols=([(0,8),(8,16),(16,None),(0,50)])
rs=pd.read_fwf(fn,colspecs=cols,header=None)
gives me:
0 1 2 3
0 11 12 13.0 11 12 13
1 text for 1 NaN text for 1
2 21 22 23.0 21 22 23
3 text for 2 NaN text for 2
is there any way to alternate the formats of the lines
You can try to get every other row, and join the Text
into one string as a new Text
column, like this:
data = df.values.tolist()[::2][1:]
df = df[1:]
df = pd.DataFrame(df.values.tolist()[::2], columns=df.columns)
df['Text'] = [' '.join([str(x) for x in i[:-1]]) for i in data]
df = df.drop('3', axis=1)
print(df)
Result:
Column 1Column 2Column Text
0 11 12 13.0 text for 1.0
1 21 22 23.0 text for 2.0
2 31 32 33.0 text for 3.0
3 41 42 43.0 text for 4.0
4 51 52 53.0 text for 5.0
The first line of code creates an new list with the values of every other row in df
, the second lines removes the first row from df
, then the third line creates a new dataframe
with every other row, then the fourth line creates the Text
column with a list
comprehension with the data
list, the fifth row drops the 3rd column, because it's extra, the sixth row prints the data out.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.