Python：用一个以上的分隔符将字符串分成两列

Question

I am importing data from a csv file, I want to split the column 'topThemes' into an array/dataframe with two columns.我正在从 csv 文件导入数据，我想将列“topThemes”拆分为一个包含两列的数组/数据框。
In the first column I want to have the name of the theme (eg Biology), in the second column I want its associated score (eg 62).在第一列中我想要主题的名称（例如生物学），在第二列中我想要它的相关分数（例如 62）。
When I import the column it is stored in this format:当我导入列时，它以这种格式存储：

Biology: 62\n
Economics: 12\n
Physics: 4\n
Chemistry: 8\n
and so on.

My current code and the error is shown below.我当前的代码和错误如下所示。

Code:代码：

df = pd.read_csv(r'myfilelocation')

split = [line.split(': ') for line in df['topThemes'].split('\n')]

Error:错误：

AttributeError("'Series' object has no attribute 'split'")

CSV file being imported:正在导入的 CSV 文件：

My csv file 我的 csv 文件

How I want it to look:我希望它看起来如何：

Ideal format理想格式

Thanks for any help / responses.感谢您的任何帮助/回复。

Answer 1

Specify the delimiter to use with sep and the column names with names of the read_csv() function:使用read_csv() function 的names指定要与sep一起使用的分隔符和列名称：

df = pd.read_csv(r'myfilelocation', sep=':', names=['topThemes', 'score'])

Documentation here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html此处的文档： https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

Oh, I see your source CSV file now...哦，我现在看到你的源 CSV 文件...
There's probably a cleaner way to do this in less steps, but I think this produces your requested output:可能有一种更简洁的方法可以在更少的步骤中执行此操作，但我认为这会产生您请求的 output：

data = pd.read_csv(r'myfilelocation', usecols=['topThemes'])
data = pd.DataFrame(data['topThemes'].str.split('\n').values.tolist()).stack().to_frame(name='raw')

df = pd.DataFrame()
df[['topTheme', 'score']] = data['raw'].apply(lambda x: pd.Series(str(x).split(":")))
df.dropna(inplace=True)

Python：用一个以上的分隔符将字符串分成两列

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-03 14:18:02

Python：用一个以上的分隔符将字符串分成两列

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-03 14:18:02

解决方案1
1 已采纳 2020-06-03 14:18:02