[英]PYTHON: Split Existing Column into Multiple without Affecting other columns
I just started learning PYTHON. 我刚刚开始学习PYTHON。 I tried to search an answer for my problem but didn't have luck.
我试图为我的问题寻找答案,但是没有运气。
I have an excel file with multiple columns. 我有一个包含多列的Excel文件。
For example, this is what I have in the Excel file. 例如,这就是Excel文件中的内容。
and I would like to change the file to look like below. 我想将文件更改为如下所示。 I used "Text to Columns" on Excel to do this(highlighted in yellow), but couldn't figure out how to do it using Python without affecting other columns.
我在Excel上使用“文本到列”来执行此操作(以黄色突出显示),但无法弄清楚如何使用Python进行操作而不影响其他列。
I would greatly appreciate your help! 非常感谢您的帮助!
Best, Tae 太好了
This should go something like below: 这应该如下所示:
data['a'], data['col2'] = data['Information'].str.split('-', 1).str
data['b'], data['col3'] = data['col2'].str.split('-', 1).str
data['c'], data['col4'] = data['col3'].str.split('-', 1).str
data['d'], data['e'] = data['col4'].str.split('-', 1).str
This may not be the efficient way but will work for sure. 这可能不是有效的方法,但可以肯定地起作用。 This will spilt col
Information
in 5 different columns 这会将col
Information
溢出到5个不同的列中
Updated answer as per updated data in question 根据有问题的更新数据更新答案
data = pd.read_excel("/path/to/file/Example for Pygo.xlsx")
data['a'], data['col2'] = data['Information'].str.split('-', 1).str
data['b'], data['col3'] = data['col2'].str.split('-', 1).str
data['c'], data['col4'] = data['col3'].str.split('-', 1).str
data['d'], data['e'] = data['col4'].str.split('-', 1).str
data = data.drop(['Information','col2', 'col3', 'col4'], axis = 1)
Check out the string.split()
method. 检出
string.split()
方法。 You can pass in an argument to split on, in this case string.split('-')
您可以传入一个参数进行拆分,在这种情况下为
string.split('-')
array[index]=array[index].split('-')
one easy way is to use dataframe to process the dataset. 一种简单的方法是使用数据框处理数据集。 1. read the xls file into dataframe using, you may find the details here xls into dataframe
1.使用将xls文件读入数据框,您可以在此处找到详细信息xls到数据框
please find examples below. 请在下面找到示例。
Example - 2 lines only 示例-仅2行
import pandas as pd
df = pd.read_excel(open('/Users/xxx/Downloads/ExampleforPygo.xlsx','rb'), sheet_name=0)
df = df.merge(df.apply(lambda row: pd.Series(row['Information'].split('-')), axis=1), left_index=True, right_index=True)
print(df)
Example with separate function. 具有单独功能的示例。
import pandas as pd
def splitInfomation(information):
ret = {}
splits = information.split('-')
for idx, split in enumerate(splits):
ret['split' + str(idx)] = split
return pd.Series(ret)
df = pd.read_excel(open('/Users/xxxx/Downloads/ExampleforPygo.xlsx','rb'), sheet_name=0)
df = df.merge(df.apply(lambda row: splitInfomation(row['Information']), axis=1), left_index=True, right_index=True)
print(df)
Updated the Answer based on your example file given, in your case the datafile is xlsx
so, you have to do like below, You can use Just str.split
method to get the Job done, i also used fillna
in case whereas no values Just mark them None
. 根据给定的示例文件更新了Answer,在您的情况下,数据文件为
xlsx
因此,您必须执行以下操作,可以使用Just str.split
方法完成任务,在没有值的情况下,我也使用fillna
将它们标记为“ None
。
When using expand=True
, the split elements will expand out into separate columns. 当使用
expand=True
,split元素将扩展为单独的列。
>>> import pandas as pd
>>> pd.set_option('display.height', None)
>>> pd.set_option('display.max_rows', None)
>>> pd.set_option('display.max_columns',None)
>>> pd.set_option('display.width', None)
>>> data_xls = pd.read_excel("Example_data.xlsx", index_col=None).fillna('')
>>> data_xls['Information'].str.split('-', expand=True).head(30)
0 1 2 3 4
0 us EXAMPLE article1 scrolldown findoutnow
1 us EXAMPLE article1 scrollright None
2 us EXAMPLE article1 findoutnow None
3 us EXAMPLE payablesmanagement findoutnow None
4 us EXAMPLE strategicpurchasing scrollright None
5 us EXAMPLE article1 learnmore profitmargins
6 us EXAMPLE payablesmanagement scrollright None
7 us EXAMPLE article2 scrollright None
8 us EXAMPLE controlandvisibilty findoutnow None
9 us EXAMPLE article1 scrollleft None
10 us EXAMPLE homepage amexlogo None
11 us EXAMPLE profitmargins findoutnow None
12 us EXAMPLE article3 findoutnow None
13 us EXAMPLE article1 learnmore payablesmanagement
14 us EXAMPLE article2 scrollleft None
15 us EXAMPLE article3 scrollright None
16 us EXAMPLE homepage readmore payablesmanagement
17 us EXAMPLE article1 None None
18 us EXAMPLE homepage homenav findoutnow
19 us EXAMPLE controlandvisibilty scrollright None
20 us EXAMPLE homepage homenav payablesmanagement
21 us EXAMPLE homepage scroll findoutnow
22 us EXAMPLE article3 scrollleft None
23 us EXAMPLE article1 learnmore strategicpurchasing
24 us EXAMPLE article1 learnmore controlandvisibility
25 us EXAMPLE article1 scrolldown findoutnow
26 us EXAMPLE article1 scrollright None
27 us EXAMPLE article1 findoutnow None
28 us EXAMPLE payablesmanagement findoutnow None
29 us EXAMPLE strategicpurchasing scrollright None
Borrowed From @Jon.. to get the whole dataset along with your orignal ones & new ones included... 从@Jon ..借来以获取整个数据集以及原始数据和新数据。
>>> data_xls.join(data_xls['Information'].str.split('-', expand=True).add_prefix('newCol_')).head()
Date Information EXAMPLE_LinkedIn_SponsoredContent_Visits EXAMPLE_LinkedIn_inMail_Visits EXAMPLE_DBM_Native_Visits EXAMPLE_SGCPB_Native_Visits EXAMPLE_SGCBDC_Email_Visits EXAMPLE_SGCPB_Email_Visit \
0 2018-08-20 us-EXAMPLE-article1-scrolldown-findoutnow 0 0 0 0 0 0
1 2018-08-20 us-EXAMPLE-article1-scrollright 0 0 0 0 0 0
2 2018-08-20 us-EXAMPLE-article1-findoutnow 1 0 1 0 0 0
3 2018-08-20 us-EXAMPLE-payablesmanagement-findoutnow 0 0 0 0 0 0
4 2018-08-20 us-EXAMPLE-strategicpurchasing-scrollright 0 0 0 0 0 0
EXAMPLE_SGCBDC_Native_Visits EXAMPLE_ConstructionDive_Email_Visit EXAMPLE_ConstructionDive_PromotedStory_Visit EXAMPLE_SGCPB_PromotedStory_Visit EXAMPLE_SGCBDC_PromotedStory_Visit EXAMPLE_ConstructionDive_Native_Visits newCol_0 newCol_1 \
0 0 0 0 0 0 0 us EXAMPLE
1 0 0 0 0 0 0 us EXAMPLE
2 0 0 0 0 0 0 us EXAMPLE
3 0 0 0 0 0 0 us EXAMPLE
4 0 0 0 0 0 0 us EXAMPLE
newCol_2 newCol_3 newCol_4
0 article1 scrolldown findoutnow
1 article1 scrollright None
2 article1 findoutnow None
3 payablesmanagement findoutnow None
4 strategicpurchasing scrollright None
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.