如何在for循环中将列表正确附加到DataFrame

Question

I'm attempting to add every item in a list of strings (lines from a file) to my DataFrame.我正在尝试将字符串列表（文件中的行）中的每个项目添加到我的 DataFrame。 The line is filled with keys and values dumped into a list and converted to json.该行填充了转储到列表中并转换为 json 的键和值。 The issue is I cant get pandas to properly make a DataFrame from the list in the loop (code gets stuck in for loop).问题是我无法让 Pandas 从循环中的列表中正确制作 DataFrame（代码卡在 for 循环中）。

df = pd.DataFrame()
df2 = pd.DataFrame()
with open(log_file_path, "r") as file:
    for line in file:
        line = json.loads(line[1:])
        items = line.items() 
        all_list.append(list)

        df = df.append(pd.DataFrame.from_records([line])) 

        continue
print("work")
print(df)
print(df.head())

Here is what each line looks like.这是每行的样子。

line = {'protocol': 'https', 'instanceid': 'beacond-lga13-1349-12003', 'raw_data': 'i|200|122!i|200|114!i|200|117', 'source_ip': '90.227.61.0', 'ts': 1549434199, 'jobid': '1uxw9ir', 'geocode': 'SE', 'referer': 'https://sv.cam4.com/female', 'user_agent': 'Mozilla/5.0 (Linux; Android 8.0.0; SAMSUNG SM-G935F/G935FXXS3ERL4 Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/8.2 Chrome/63.0.3239.111 Mobile Safari/537.36', 'appid': '157pr4o', 'app_version': 1536174158, 'asn': 3301}

Answer 1

I would make a list of lists and THEN construct your dataframe.我会列出一个列表，然后构建您的数据框。 For example:例如：

# After collecting each list
lists = [['a', 'b'], 
['c', 'd']]
# Pass your list of lists (and you can name the columns too if you like!)
pd.DataFrame(lists, columns=['col1', 'col2'])

Output:输出：


  col1 col2
0    a    b
1    c    d

Answer 2

I can read your list in if i do it like this:如果我这样做，我可以阅读您的列表：

line = {'protocol': 'https', 'instanceid': 'beacond-lga13-1349-12003', 'raw_data': 'i|200|122!i|200|114!i|200|117', 'source_ip': '90.227.61.0', 'ts': 1549434199, 'jobid': '1uxw9ir', 'geocode': 'SE', 'referer': 'https://sv.cam4.com/female', 'user_agent': 'Mozilla/5.0 (Linux; Android 8.0.0; SAMSUNG SM-G935F/G935FXXS3ERL4 Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/8.2 Chrome/63.0.3239.111 Mobile Safari/537.36', 'appid': '157pr4o', 'app_version': 1536174158, 'asn': 3301}


pd.DataFrame(line, index=[0])

You can use range in the index column index=range(0,len(items)) possibly as well,您也可以在索引列 index=range(0,len(items)) 中使用范围，


lines = [{'protocol': 'https',
 'instanceid': 'beacond-lga13-1349-12003',
 'raw_data': 'i|200|122!i|200|114!i|200|117',
 'source_ip': '90.227.61.0',
 'ts': 1549434199,
 'jobid': '1uxw9ir',
 'geocode': 'SE',
 'referer': 'https://sv.cam4.com/female',
 'user_agent': 'Mozilla/5.0 (Linux; Android 8.0.0; SAMSUNG SM-G935F/G935FXXS3ERL4 Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/8.2 Chrome/63.0.3239.111 Mobile Safari/537.36',
 'appid': '157pr4o',
 'app_version': 1536174158,
 'asn': 3301},
{'protocol': 'https',
 'instanceid': 'beacond-lga14-1349-12003',
 'raw_data': 'i|200|122!i|200|114!i|200|117',
 'source_ip': '90.227.61.1',
 'ts': 1549434199,
 'jobid': '1uxw9ir',
 'geocode': 'SE',
 'referer': 'https://sv.cam4.com/female',
 'user_agent': 'Mozilla/5.0 (Linux; Android 8.0.0; SAMSUNG SM-G935F/G935FXXS3ERL4 Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/8.2 Chrome/63.0.3239.111 Mobile Safari/537.36',
 'appid': '157pr4o',
 'app_version': 1536174158,
 'asn': 3301}]

pd.DataFrame(lines, index=list(range(0, len(lines))))

output:输出：

Out[899]: 
  protocol                instanceid                       raw_data    source_ip          ts  ...                     referer                                         user_agent    appid app_version   asn
0    https  beacond-lga13-1349-12003  i|200|122!i|200|114!i|200|117  90.227.61.0  1549434199  ...  https://sv.cam4.com/female  Mozilla/5.0 (Linux; Android 8.0.0; SAMSUNG SM-...  157pr4o  1536174158  3301
1    https  beacond-lga14-1349-12003  i|200|122!i|200|114!i|200|117  90.227.61.1  1549434199  ...  https://sv.cam4.com/female  Mozilla/5.0 (Linux; Android 8.0.0; SAMSUNG SM-...  157pr4o  1536174158  3301

[2 rows x 12 columns]

如何在for循环中将列表正确附加到DataFrame

问题描述

2 个解决方案

解决方案1
0 2019-12-04 20:10:07

解决方案2
0 已采纳 2019-12-04 22:46:11

如何在for循环中将列表正确附加到DataFrame

问题描述

2 个解决方案

解决方案1 0 2019-12-04 20:10:07

解决方案2 0 已采纳 2019-12-04 22:46:11

解决方案1
0 2019-12-04 20:10:07

解决方案2
0 已采纳 2019-12-04 22:46:11