[英]How to split unformatted data into several columns in Python?
Dears, 亲爱,
recently I use crawler to fetch information from website, and get a column of data like this: 最近我使用crawler从网站获取信息,并得到一列这样的数据:
| **Hotel Info** |
| 2014 open 2016 retrofit 50 rooms |
| 60 rooms |
| 2012 open 100 rooms |
| 80 rooms |
| 2010 open |
I want it to be like this finally: 我希望它最终是这样的:
| **Hotel Open** | **Hotel Retrofit** | **Hotel Rooms** |
| 2014 | 2016 | 50 |
| null | null | 60 |
| 2012 | null | 100 |
| null | null | 80 |
| 2010 | null | null |
NOTE: 注意:
The original website doesn't split these 3 'information blocks' separately. 原始网站不会单独拆分这3个'信息块'。 They are all under a
<p>...</p>
block. 它们都在
<p>...</p>
块下。 Therefore I cannot avoid this issue. 因此我无法避免这个问题。
I am using Python, and totally new in it. 我正在使用Python,并且是全新的。 Please help me and THANK YOU VERY MUCH!!!
请帮助我,谢谢你!
suppose you have data in test.xlsx
file, you can try this : 假设你有
test.xlsx
文件中的数据,你可以试试这个:
import collections
import numpy as np
import pandas as pd
df = pd.read_excel('test.xlsx', sheetname='Sheet1')
df_dict = collections.defaultdict(list)
for i in df['**Hotel Info**']:
i_list = i.split(' ') #split with multiple spaces ( )
df_dict['**Hotel Open**'].append([e.split('open')[0].strip() for e in i_list if 'open' in e])
df_dict['**Hotel Retrofit**'].append([e.split('retrofit')[0].strip() for e in i_list if 'retrofit' in e])
df_dict['**Hotel Rooms**'].append([e.split('rooms')[0].strip() for e in i_list if 'rooms' in e])
df_dict['**Hotel Open**']=[np.nan if len(item)==0 else int(item[0]) for item in df_dict['**Hotel Open**']]
df_dict['**Hotel Retrofit**']=[np.nan if len(item)==0 else int(item[0]) for item in df_dict['**Hotel Retrofit**']]
df_dict['**Hotel Rooms**']=[np.nan if len(item)==0 else int(item[0]) for item in df_dict['**Hotel Rooms**']]
new_df = pd.DataFrame(df_dict)
new_df
new_df will be: new_df将是:
**Hotel Open** **Hotel Retrofit** **Hotel Rooms**
0 2014 2016 50
1 NaN NaN 60
2 2012 NaN 100
3 NaN NaN 80
4 2010 NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.