[英]Reading string data separated by spaces in Pandas
我在文本文件中有两列数据,例如如下。
Balkrishna Industries Ltd. Auto Ancillaries 3.54
Aurobindo Pharma Ltd. Pharmaceuticals 3.36
NIIT Technologies Ltd. Software 3.31
Sonata Software Ltd. Software 3.21
当我试图在Pandas中读取它时,我得到一个错误,因为空格是分隔符,公司名称不限于单个列。 如何修改我的代码以将这些数据分成两列,一列用于名称,另一列用于数字?
import numpy as np
import pandas as pd
data = pd.read_csv('file.txt', sep=" ", header=None)
data.columns = ["Name", "Fraction"]
print(data)
使用Regex Lookbehind&Lookahead sep="(?<=\\w) (?=\\d)"
例如:
import pandas as pd
df = pd.read_csv(filename, sep="(?<=\w) (?=\d)", names=["Name", "Fraction"])
print(df)
输出:
Name Fraction
0 Balkrishna Industries Ltd. Auto Ancillaries 3.54
1 Aurobindo Pharma Ltd. Pharmaceuticals 3.36
2 NIIT Technologies Ltd. Software 3.31
3 Sonata Software Ltd. Software 3.21
另一种方法是将文件作为一列读取( 使用文件中不存在的sep
字符 - 例如 |
)。
然后使用Series.str.rsplit
( n=1
和expand=True
参数)从右侧分割字符串,只有1个分区,作为具有2列的DataFrame
返回:
df = pd.read_csv('file.txt', sep='|', header=None)
df = df[0].str.rsplit(' ', n=1, expand=True)
df.columns = ["Name", "Fraction"]
[OUT]
Name Fraction
0 Balkrishna Industries Ltd. Auto Ancillaries 3.54
1 Aurobindo Pharma Ltd. Pharmaceuticals 3.36
2 NIIT Technologies Ltd. Software 3.31
3 Sonata Software Ltd. Software 3.21
使用“char-space-digit”分隔符:
import pandas as pd
df = pd.read_csv("mycsv.txt", sep="\w\s\d", engine="python", names=["Name", "Fraction"])
print(df)
Name Fraction
0 Balkrishna Industries Ltd. Auto Ancillarie 0.54
1 Aurobindo Pharma Ltd. Pharmaceutical 0.36
2 NIIT Technologies Ltd. Softwar 0.31
3 Sonata Software Ltd. Softwar 0.21
只需将其作为单列数据框读入此样本:
df:
name
0 Balkrishna Industries Ltd. Auto Ancillaries 3.54
1 Aurobindo Pharma Ltd. Pharmaceuticals 3.36
2 NIIT Technologies Ltd. Software 3.31
3 Sonata Software Ltd. Software 3.21
之后只需调用str.rpartition
上df.name
并删除空白列如下:
df.name.str.rpartition().drop(1, 1).set_axis(["Name", "Fraction"], axis=1, inplace=False)
Out[1594]:
Name Fraction
0 Balkrishna Industries Ltd. Auto Ancillaries 3.54
1 Aurobindo Pharma Ltd. Pharmaceuticals 3.36
2 NIIT Technologies Ltd. Software 3.31
3 Sonata Software Ltd. Software 3.21
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.