[英]How is a pandas column of rows containing variable length and comma separated strings of values, stacked into separate values?
I'm in the process of trying to separate specific values in a pandas column so that any "groups" of values become separate values.我正在尝试分离熊猫列中的特定值,以便任何“组”值成为单独的值。
The code I'm using to do this currently is as follows:我目前用来执行此操作的代码如下:
import csv
import pandas as pd
data = pd.read_csv('ctabuses.csv')
route_column = data['routes']
with open('results.csv', 'wt+') as csv_file:
writer = csv.writer(csv_file)
for value in route_column:
writer.writerow(value.split)
However, when I write the contents to a file it produces this:但是,当我将内容写入文件时,它会生成以下内容:
126
121,123
1,7,X28,126,129,130,132,151
1,7,X28,126,129,130,151
1,7,X28,126,129,130
1,7,X28,126,129
1,3,4,7,J14,26,X28,126,129,132,143,147,148
7,126,132,143,147
1,7,X28,126,129
3,4,6,J14,26,143
1,7,X28,126,129,151
1,7,X28,126,129,130,134,135,136,151,156
125,126
126
126
126
I've searched and tried everything I can think of and keep getting the same result.我已经搜索并尝试了所有我能想到的方法并不断得到相同的结果。
Edit: Expected Result My expected output if I encounter a group of values like this:编辑:预期结果如果我遇到一组这样的值,我的预期输出:
1,7,X28,126,129,130,134,135,136,151,156
Should be:应该:
1
7
X28
126
129
130
134
135
136
151
156
Which would then be used to import into a MySQL database.然后将用于导入 MySQL 数据库。
Imports:进口:
import pandas as pd
Create DataFrame:创建数据框:
df = pd.read_csv('data.csv', header=None)
df.head()
0
0 126
1 121,123
2 1,7,X28,126,129,130,132,151
3 1,7,X28,126,129,130,151
4 1,7,X28,126,129,130
String to list:要列出的字符串:
df_list = df.apply(lambda row: pd.Series(row).str.split(','))
df_list.head()
0
0 [126]
1 [121, 123]
2 [1, 7, X28, 126, 129, 130, 132, 151]
3 [1, 7, X28, 126, 129, 130, 151]
4 [1, 7, X28, 126, 129, 130]
List to long:长名单:
df_long = df_list.apply(lambda x: pd.Series(x[0]), axis=1).stack().reset_index(level=1, drop=True)
df_long
0 126
1 121
1 123
2 1
2 7
2 X28
2 126
2 129
2 130
2 132
2 151
3 1
3 7
3 X28
3 126
3 129
3 130
3 151
...
Save to csv:保存到 csv:
df_long.to_csv('results.csv', index=False)
Final Program (4 lines):最终程序(4 行):
df = pd.read_csv('ctabuses.csv')
df_routes = df.routes.apply(lambda row: pd.Series(row).str.split(','))
df_routes = df_routes.apply(lambda row: pd.Series(row[0]), axis=1).stack().reset_index(level=1, drop=True)
df_routes.to_csv('results.csv', index=False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.